Sound bytes
New techniques for indexing audio files that use the sound of words, or phonemes, rather than entire words, could provide a major breakthrough that will make the information in audio files easier to manipulate and exploit.
New techniques for indexing audio files that use the sound of words, or phonemes, rather than entire words, could provide a major breakthrough that will make the information in audio files easier to manipulate and exploit.For example, just as the contents of Web pages and text documents are now easily indexed by search engines, the contents of audio files ? recordings of courtroom depositions, police interrogations, staff meetings, seminars, television shows ? may also one day become as easily searchable, researchers said. Unfortunately, the predominant audio-indexing tools to date have been cumbersome and error-prone, industry observers said. Using databases that translate sounds into words proves to be a lengthy process. Compiling dictionaries beforehand tends to be extremely time-consuming. But now some companies are making headway by eliminating these dictionaries altogether ? namely by having their technologies parse spoken language into phonetic bits rather than words. This new approach may be the key in conquering select markets, such as those for searchable multimedia repositories and large-scale automated phone directory services, even if the larger market for general use audio searching tools remains elusive.Fast-Talk Communications Inc., Atlanta, has developed a phonetic audio engine that it claims can correctly identify words 98 percent of the time, beating the 80 percent success rate of traditional whole-word approaches, according to Patrick Taylor, senior vice president of sales and marketing. With Fast-Talk's approach, a user-query is broken into phonemes, and then matches and near-matches from machine-transcribed recordings are returned.Science Applications International Corp., San Diego, was impressed enough with the Fast-Talk solution to incorporate it into its own media asset solution. SAIC also subsequently invested in the company. SAIC's system indexes video and audio to search for specific words, phrases or even images. "What we saw in Fast-Talk was a way to pre-process information and have it searched in real-time," said Kevin Vest, SAIC assistant vice president and the New Media Technology Division's director of engineering. "We could run queries as the information is coming in."Another advantage of the Fast-Talk technology is it doesn't require user dictionaries, a distinct advantage when dealing with acronym-fluent government clients."If a new company name or a new technical term suddenly appears in a [recorded] conversation and it's processed with speech recognition, you won't be able to hit it. With Fast-Talk, you will," Vest said. SAIC has installed one system for a commercial client since the system's introduction last October, Vest said. The company foresees government distance learning programs as a good fit for these solutions. "That wasn't an application we expected," said Vest of the technology's use in e-learning. However, the company found it allows clients of its video productions to quickly review and approve work over the Web. Fast-Talk's phonetic technology is also part of digital media asset management solutions from providers such as Convera Corp., Vienna, Va., and Virage Inc., San Mateo, Calif. Jeff Karnes, group product manager for Virage, said the company primarily uses standard dictionary-based speech synthesis software from IBM Corp. and BBN Technologies, a Verizon Communications Inc.-owned research company. However, in select cases, the company will deploy Fast-Talk's engine for environments with heavy accents or a complex terminology. A phonetic-based search engine developed by Phonetic Systems Inc., Bedford, Mass., helped that company achieve a performance advantage in its own market, the one for voice-automated phone directories. In 2001, the company released a version of its system scaled past the products of the company's competitors ? who employ whole-word-based approaches. It offered solutions that could support over 10.5 million unique directory listings.In the government space, Phonetic Systems is focusing on military bases and aircraft carriers, according to Fred Herrmann, Phonetic Systems' director of the federal systems office. If an average-sized base has three full-time operators, and Phonetic Systems' solution can eliminate two of those operators, then the service can see a return on investment within a year, Herrmann said. The company is now installing a system at the Marine base Camp Pendleton, Calif.In addition to saving money, the technology, because it is server-based, can enable a number of new features for users. For instance, soldiers from a unit that is deployed abroad can call back to base and, just by saying "phone home" into the receiver, can be connected to their loved ones. The database of individuals can also hold additional information, such as mail routing information, that can be easily searched. "People normally install the system for the phonetic operator, but once you have the database in place, it is just a matter of writing the applications," Herrmann said. Despite these successes, experts caution that phonetic searching is an immature technology.Lou Latham, an analyst at Gartner Inc., Stamford, Conn., said that while phonetic searching works well in closely defined environments, more research is needed to refine these technologies into general-purpose tools. In particular, the algorithms that undertake grammatical processing are "still very primitive," he said. "You either need a controlled user population or a controlled vocabulary to achieve success," Latham said.
Patrick Taylor of Fast-Talk Communications Inc. said the company's phonetic audio engine can correctly identify words 98 percent of the time.
NEXT STORY: On the Edge