“Please press 1″…or “Say English”?—Interactive Voice Response Analysis
October 10, 2007, 12:34 am
Journal #1

Human beings’ demands caused innovation; and the innovations facilitate human life and satisfy the needs of people. There are many interaction designs around. Everyday we rely on these designs to solve our problems and acquire knowledge. I think a properly designed product is to simplify the complexity of solving-problem process, and achieve the goal without wasting time and energy. In short, the design should be based on easy-usage—easy to learn, and easy to execute. Redundant design would confuse users. It is all the designers’ fault and those designs would be improper and useless.

For some reasons and experiences, I would like to explore about the Interactive Voice Response (IVR) as the topic. “In telephony, interactive voice response (IVR) is an automated telephony system that interacts with callers, gathers information and routes calls to the appropriate recipient. An IVR system (IVRs) accepts a combination of voice telephone input and touch-tone keypad selection and provides appropriate responses in the form of voice, fax, callback, e-mail and perhaps other media. The IVR system can respond with pre-recorded or dynamically generated audio to further direct callers on how to proceed. IVR systems can be used to control almost any function where the interface can be broken down into a series of simple menu choices.” [1]

Because I had to deal with my cell phone account, I called T-Mobile customer center and then was greeted by a natural voice: “If you want to listen to English, say English.” and I definitely said “English” without thinking; after that, the “agent” asked me “what service do you need today?” but I paused for a sec, and finally I answered “account” and tried to get response from it; however, this “conversation” ended up because I could not access my account via phone call, so I asked to switch to a live agent readily. Based on my experience, though speech technology is more advanced than touch-tone keypresses, for some inactive customers, it’s more convenient for them to get “options” from the IVR and then press the keypad to obtain the confirmation of their action. Thus, I would like to focus on the pros and cons between touch-tone keypresses (DTMF) and natural language speech recognition system.

Speech recognition technology contains Artificial Intelligence concept that makes it workable to recognize a broader range of expressions and learning from experience. Nonetheless, the natural language speech recognition tech is not widely used by companies and business. I think there are some limitations of speech recognition system:

1) Cost. The equipment is more expensive than touch-tone system. Besides, it takes to train the software to capture and recognize callers’ utterances.

2) Accuracy. Average, the natural language speech recognition system could capture 2,500 utterances[2]; however, other pessimistic users, like Bern Elliot, an analyst at Gartner Inc. in Stamford, Conn., said that “Normally, the success rate is 25%, rising to 45% or 50% if you put effort into it.” “If you work at it and motivate people, you might get to 70%, but that is the exception rather than the rule” he added.

Though, according to Lamont Wood’s article “Talking to machines: Interactive voice response gets better”, Bob Meisel explained that “Speech recognition accuracy is not an issue, since the system can prompt for clarification if it’s confused.” Moreover, according to Wikipedia, “most commercial companies states that recognition software can achieve between 98% to 99% accuracy if operated under controlled conditions.”

Lynda Smith, division manager at Nuance Communications Inc. in Burlington, Mass, claimed that “compared to the touch-tone IVRs, the use of speech recognition system reduces misdirected calls by up to 50%.” In my opinions, speech recognition system simplifies the application menu-structure, and can abridge callers’ time: they don’t have to listen all of options and make choice. Further, callers feel more interactive when doing self-service applications, because it sounds like you are talking with a “live agent”! Besides, callers don’t have to put down the phone to press keypad for making choice. They can just say a term and then obtain what they want.

Nonetheless, based on my personal experience, the accuracy of speech recognition is not always so high. I think the ideal system is to combine both touch-tone and natural language speech recognition system. It is more efficient to apply speech recognition when the IVRs ask “Yes or No” questions; however, if callers have to provide information related to numbers such as account number, it is better allow callers to use keypresses. Actually, most companies utilize partial IVRs now and it much benefits both customers and companies.

Even though those innovations assist human beings in their daily life, criticism always arose. Some people think that IVRs is unhelpful for usage and hard to use because of poor design and not meeting callers’ needs. A well-designed IVR system should immediately meet callers’ demand and with a minimum of complication. For me, the innovation of IVR indeed improves the efficiency on solving problems and facilitates both operators and callers’ usage; yet, I would prefer the lowest level of IVR (touch-tone system) than the speech recognition system when I call to calling center.