By Garry Chinn
March 13, 2001
Companies have invested billions of dollars in developing
Web sites to deliver content, products and services to customers.
However, the Web only reaches a small percentage of the world.
The telephone, on the other hand, is the most ubiquitous technology
available, yet traditional telephony solutions are costly,
inflexible, and can be difficult to design and deploy. With
the growing demand for businesses to provide immediate and
simple access to information and services, it's no wonder
that companies are looking to take advantage of voice applications,
which, with current advancements, are beginning to be deployed
using the Web.
Interactive voice response (IVR) systems, which have been
around for decades, are only realistic for use by the largest
enterprises, which are able to commit the time, investment,
and resources to build the proprietary voice applications.
Only a handful of companies (such as Charles Schwab and United
Airlines) were progressive enough to pioneer new technologies
such as speech-driven IVR applications. As far as off-the-shelf
voice solutions, there have only been a limited number of
applications -- such as auto attendants and voice mail systems
-- that have gained widespread adoption by businesses and
consumers.
The New Voice Web
Consider the standard methods of communication and information
access today -- the telephone and the Internet. The combination
of these two technologies gives companies a brand new resource
for connecting with customers: the voice Web.
The voice Web will extend the Web we know today by providing
a new channel by which customers can access and retrieve information.
By leveraging the infrastructure of Web-based content and
applications, low cost, custom voice applications can easily
be built. Web applications -- written in XML or HTML code
-- can be transformed into telephony voice applications.
As businesses look into adopting voice technology, they must
consider options that will grow easily and cost effectively
with their business. The voice Web will make it more economical
to deploy voice applications, allowing small- and medium-sized
businesses to build and use state-of-the-art voice solutions.
Large businesses will also benefit; they will be able to economically
deploy more specialized applications targeted to segmented
customer bases.
The arrival of enhanced telephony devices such as smart phones
and wireless personal digital assistants (PDAs) will enable
"multi-modal" applications which handle both voice
and data in the same experience -- making next generation
voice applications more powerful than ever. While voice won't
take over the mouse and keyboard on desktop systems, for handheld
devices, voice is an essential input modality, making voice
applications a core part of the mobile network. Next generation
handsets will be able to support a voice and a data channel
simultaneously, allowing true multi-modal browsing: voice
and keypad input combined with audio and visual output. Multi-modal
browsing will simplify navigation and information retrieval
by replacing multiple keystroke commands with spoken phrases,
ultimately increasing the power and effectiveness of telephony
applications.
Voice browsing transforms Web applications into telephony
voice applications. Just as a Web browser renders the user
interface on a PC, a voice browser translates HTML or XML
code into voice. A voice browsing solution exploits the basic
architecture of the Web, allowing content developers to re-use
as much of the existing components as possible, with little
or no modification to produce a low cost voice solution.
VoiceXML: A Scripting Language
VoiceXML is a scripting language based on the XML standard,
which contains the basic elements for constructing a voice-driven
IVR application. VoiceXML supports the creation of menu- or
machine-directed dialogs that guide users through an application
by a series of menu prompts. It also provides basic transactional
elements -- a key to supporting telephony-based commerce.
VoiceXML does have limitations, though. While VoiceXML is
good for building simple applications, it does not scale for
building more complex dialogs or transactional functions.
Second, while well-designed static content management applications
are easily modified to support VoiceXML (or any other markup
language for that matter), dynamic content management applications
are built largely using programming code that is targeted
for HTML. Thus, Web applications and Web content will have
to be re-written to support dynamic VoiceXML applications.
For some Web applications, the dynamic content management
functionality may be the most expensive component to build
and maintain. Finally, many Web applications use client-side
JavaScript to support more sophisticated transactional capabilities
such as validation tasks. The document object model (DOM)
of HTML is different from that of any other markup language.
Consequently, client-side JavaScript developed for HTML applications
cannot be directly re-used even if a VoiceXML platform supports
JavaScript.
HTML For The Voice Web
The other more flexible and cost-effective approach to creating
voice-driven applications is using the HTML upon which these
applications are already based. Unlike a desktop browser,
an HTML-ready voice browser uses the DOM representation to
generate a dialog interaction instead of a visual layout.
HTML is made for visual presentation and a good voice experience
cannot be generated from HTML alone. To customize and tune
the voice experience, either specialized tags or a separate
voice style language is used to supplement the existing HTML.
If the voice browser uses a style language, the content can
be separated from the presentation. This has a number of advantages.
Traditional IVR applications rely on machine-directed dialogs
to effectively walk a user through a hierarchical menu. Like
VoiceXML voice browsers, HTML voice browsers can generate
directed dialogs. It is also possible to overlay mixed initiative
dialogs for navigation without modifying the underlying HTML
content. This navigation allows the user to speak more natural
phrases like "get me an IBM stock quote" to bypass
the step-by-step menu dialog interaction. These interactions
can also be built in VoiceXML, but the dialogs would not be
automatically generated from a DOM as in the case of an HTML
voice browser. With VoiceXML, a content developer has to manually
program such capability into the system, making it consequently
more expensive to build and maintain.
Another advantage of the style language approach is that
it allows the content developer to leverage a greater portion
of existing Web assets. Dynamic content Web applications use
programming code to generate HTML. An HTML voice browser can
make direct use of this HTML presentation layer without re-writing
programming elements. In practice, content developers may
find that some tuning is required to improve the voice experience.
Another important benefit of HTML voice browsers is the ability
to support existing client-side JavaScript. Client-side JavaScript
in existing Web applications is written for an HTML DOM. Since
the original HTML is used by an HTML voice browser, the original
JavaScript is re-usable by platforms with JavaScript support.
For a VoiceXML voice browser, the original client JavaScript
could not simply be moved to the VoiceXML document. It would
have to be rewritten without referencing the HTML DOM. Therefore,
content developers looking to leverage a heavy investment
in transactional or dynamic content application development,
should consider looking at HTML voice browser platforms.
Platform Components For Enhancement
A voice Web application written in VoiceXML or HTML can be
enhanced by using platform components. To build even more
powerful dialog and transaction interactions, there are other
platform technologies intended for traditional IVR development.
Such platforms combine libraries and full-featured programming
languages like Java and C++ to build complex dialogs and transactional
capabilities that could not be built using either VoiceXML
or HTML alone. For example, the Help capability of a VoiceXML
or HTML voice browser is not as customizable as one built
using a programming language like Java. The price of this
power and flexibility is the high cost of programming full
applications. Both VoiceXML and HTML support embeddable components
through the object element tag. By combining these technologies,
content developers can invoke platform capabilities to enhance
VoiceXML or HTML functionality.
Another way to extend the functionality of a VoiceXML or HTML
voice browser platform is by adding scripts. JavaScript is
one popular choice among desktop browsers.
Conclusion
The arrival of the voice Web gives businesses an economical
option to build custom voice applications, which have considerable
value as a supplementary business channel. When building voice
Web applications, it is important to carefully consider requirements
and choose platforms and components that will maximize a business'
existing investment in Web applications. By doing so, companies
will minimize the implementation and maintenance costs of
voice applications and gain a greater return on investment.
Garry Chinn is Chief Technology Officer of VocalPoint
Technologies. VocalPoint Technologies provides middleware,
infrastructure and services for businesses to rapidly voice-enable
HTML and XML content, making it possible to access Internet
and intranet applications using natural speech over any phone.
Its voice-based browser allows businesses to build customized
voice portals and services by integrating VocalPoint's proprietary
technology into their network infrastructure, or by utilizing
VocalPoint's fully outsourced ASP (application service provider)
solution. Incorporated in 1997, the company has leveraged
its speech technology research expertise to create attractive
voice-based solutions for businesses worldwide.
|