VocalPoint Technologies | Voice Web Browsing solutions for telecom, wireless, and ebusiness companies

Taking Your Business Mobile: Voice-Enable Web Content

By Garry Chinn

March 13, 2001

Companies have invested billions of dollars in developing Web sites to deliver content, products and services to customers. However, the Web only reaches a small percentage of the world. The telephone, on the other hand, is the most ubiquitous technology available, yet traditional telephony solutions are costly, inflexible, and can be difficult to design and deploy. With the growing demand for businesses to provide immediate and simple access to information and services, it's no wonder that companies are looking to take advantage of voice applications, which, with current advancements, are beginning to be deployed using the Web.

Interactive voice response (IVR) systems, which have been around for decades, are only realistic for use by the largest enterprises, which are able to commit the time, investment, and resources to build the proprietary voice applications. Only a handful of companies (such as Charles Schwab and United Airlines) were progressive enough to pioneer new technologies such as speech-driven IVR applications. As far as off-the-shelf voice solutions, there have only been a limited number of applications -- such as auto attendants and voice mail systems -- that have gained widespread adoption by businesses and consumers.

The New Voice Web

Consider the standard methods of communication and information access today -- the telephone and the Internet. The combination of these two technologies gives companies a brand new resource for connecting with customers: the voice Web.

The voice Web will extend the Web we know today by providing a new channel by which customers can access and retrieve information. By leveraging the infrastructure of Web-based content and applications, low cost, custom voice applications can easily be built. Web applications -- written in XML or HTML code -- can be transformed into telephony voice applications.

As businesses look into adopting voice technology, they must consider options that will grow easily and cost effectively with their business. The voice Web will make it more economical to deploy voice applications, allowing small- and medium-sized businesses to build and use state-of-the-art voice solutions. Large businesses will also benefit; they will be able to economically deploy more specialized applications targeted to segmented customer bases.

The arrival of enhanced telephony devices such as smart phones and wireless personal digital assistants (PDAs) will enable "multi-modal" applications which handle both voice and data in the same experience -- making next generation voice applications more powerful than ever. While voice won't take over the mouse and keyboard on desktop systems, for handheld devices, voice is an essential input modality, making voice applications a core part of the mobile network. Next generation handsets will be able to support a voice and a data channel simultaneously, allowing true multi-modal browsing: voice and keypad input combined with audio and visual output. Multi-modal browsing will simplify navigation and information retrieval by replacing multiple keystroke commands with spoken phrases, ultimately increasing the power and effectiveness of telephony applications.

Voice browsing transforms Web applications into telephony voice applications. Just as a Web browser renders the user interface on a PC, a voice browser translates HTML or XML code into voice. A voice browsing solution exploits the basic architecture of the Web, allowing content developers to re-use as much of the existing components as possible, with little or no modification to produce a low cost voice solution.

VoiceXML: A Scripting Language

VoiceXML is a scripting language based on the XML standard, which contains the basic elements for constructing a voice-driven IVR application. VoiceXML supports the creation of menu- or machine-directed dialogs that guide users through an application by a series of menu prompts. It also provides basic transactional elements -- a key to supporting telephony-based commerce.

VoiceXML does have limitations, though. While VoiceXML is good for building simple applications, it does not scale for building more complex dialogs or transactional functions. Second, while well-designed static content management applications are easily modified to support VoiceXML (or any other markup language for that matter), dynamic content management applications are built largely using programming code that is targeted for HTML. Thus, Web applications and Web content will have to be re-written to support dynamic VoiceXML applications. For some Web applications, the dynamic content management functionality may be the most expensive component to build and maintain. Finally, many Web applications use client-side JavaScript to support more sophisticated transactional capabilities such as validation tasks. The document object model (DOM) of HTML is different from that of any other markup language. Consequently, client-side JavaScript developed for HTML applications cannot be directly re-used even if a VoiceXML platform supports JavaScript.

HTML For The Voice Web

The other more flexible and cost-effective approach to creating voice-driven applications is using the HTML upon which these applications are already based. Unlike a desktop browser, an HTML-ready voice browser uses the DOM representation to generate a dialog interaction instead of a visual layout. HTML is made for visual presentation and a good voice experience cannot be generated from HTML alone. To customize and tune the voice experience, either specialized tags or a separate voice style language is used to supplement the existing HTML.

If the voice browser uses a style language, the content can be separated from the presentation. This has a number of advantages. Traditional IVR applications rely on machine-directed dialogs to effectively walk a user through a hierarchical menu. Like VoiceXML voice browsers, HTML voice browsers can generate directed dialogs. It is also possible to overlay mixed initiative dialogs for navigation without modifying the underlying HTML content. This navigation allows the user to speak more natural phrases like "get me an IBM stock quote" to bypass the step-by-step menu dialog interaction. These interactions can also be built in VoiceXML, but the dialogs would not be automatically generated from a DOM as in the case of an HTML voice browser. With VoiceXML, a content developer has to manually program such capability into the system, making it consequently more expensive to build and maintain.

Another advantage of the style language approach is that it allows the content developer to leverage a greater portion of existing Web assets. Dynamic content Web applications use programming code to generate HTML. An HTML voice browser can make direct use of this HTML presentation layer without re-writing programming elements. In practice, content developers may find that some tuning is required to improve the voice experience.

Another important benefit of HTML voice browsers is the ability to support existing client-side JavaScript. Client-side JavaScript in existing Web applications is written for an HTML DOM. Since the original HTML is used by an HTML voice browser, the original JavaScript is re-usable by platforms with JavaScript support. For a VoiceXML voice browser, the original client JavaScript could not simply be moved to the VoiceXML document. It would have to be rewritten without referencing the HTML DOM. Therefore, content developers looking to leverage a heavy investment in transactional or dynamic content application development, should consider looking at HTML voice browser platforms.

Platform Components For Enhancement

A voice Web application written in VoiceXML or HTML can be enhanced by using platform components. To build even more powerful dialog and transaction interactions, there are other platform technologies intended for traditional IVR development. Such platforms combine libraries and full-featured programming languages like Java and C++ to build complex dialogs and transactional capabilities that could not be built using either VoiceXML or HTML alone. For example, the Help capability of a VoiceXML or HTML voice browser is not as customizable as one built using a programming language like Java. The price of this power and flexibility is the high cost of programming full applications. Both VoiceXML and HTML support embeddable components through the object element tag. By combining these technologies, content developers can invoke platform capabilities to enhance VoiceXML or HTML functionality.
Another way to extend the functionality of a VoiceXML or HTML voice browser platform is by adding scripts. JavaScript is one popular choice among desktop browsers.

Conclusion

The arrival of the voice Web gives businesses an economical option to build custom voice applications, which have considerable value as a supplementary business channel. When building voice Web applications, it is important to carefully consider requirements and choose platforms and components that will maximize a business' existing investment in Web applications. By doing so, companies will minimize the implementation and maintenance costs of voice applications and gain a greater return on investment.

Garry Chinn is Chief Technology Officer of VocalPoint Technologies. VocalPoint Technologies provides middleware, infrastructure and services for businesses to rapidly voice-enable HTML and XML content, making it possible to access Internet and intranet applications using natural speech over any phone. Its voice-based browser allows businesses to build customized voice portals and services by integrating VocalPoint's proprietary technology into their network infrastructure, or by utilizing VocalPoint's fully outsourced ASP (application service provider) solution. Incorporated in 1997, the company has leveraged its speech technology research expertise to create attractive voice-based solutions for businesses worldwide.

Printer Friendly Version