The background of iPhone 4S SIRI and what to expect from it

SIRI is Apple’s speech recognition technology. SIRI has an interesting background and history. The technology used in the product has been developed by Nuance who develop Dragon Dictate. And Nuance acquired its most valuable knowledge of linguistic speech recognition in the late 90s, with the acquisition of L&H, the Belgian firm that went bankrupt amidst stories of fraud.

IOS 5 has many new features, of which SIRI looks like the most innovative. SIRI is speech technology. It can recognize what you say and will talk back to you. That is not new. In 1998, a demonstration at a L&H press conference showed their CEO commanding an e-mail client to read back the messages received out loud. The speech recognition engine talked back to the CEO with almost exactly the same type of sentences as what I’ve heard from SIRI.

Thirteen years later, SIRI sounds like it also makes the same mistakes as what L&H demonstrated to the press at the time. The speech engine will get your intentions right about 99% of the time, but fails to understand what you want approximately 1% of the time. That may seem like an incredibly good ratio, but with speech recognition it’s like with Optical Character Recognition (OCR): 1% translates into a lot of errors, making the technology a frustrating one for the user.

With OCR the frustration lies in you having to double-check the program’s results thereby concentrating on mistakes that are not obvious — not like typos that you instantly recognize as errors. With speech recognition, the aggravation comes from having to repeat your command a number of times before the thing will finally understand what you want. If you want to have driving directions spelled out to you, a command understood wrongly might either get you in a totally wrong direction or take so much time to get understood that it will be easier to just look at a roadmap.

The most alarming about speech technology is that it seems not to evolve in dramatic ways anymore. In L&H’s times, I often heard company directors claim they would force a breakthrough in a couple of years, usually followed by a statement on processor speed. The breakthrough never came, and judging from Nuance’s products, it still hasn’t come. Even Dragon Dictate, which does a wonderful job at recognizing your spoken commands, has trouble understanding dictation without some intense training up front.

Nevertheless, Nuance’s technology is the best you can get because it is not based on statistical analysis, but on a real linguistic engine. The linguistic engine used in SIRI goes back to L&H times, when linguists from all over the world were doing their thing at L&H’s headquarters and many satellite companies. But linguistics has little to do with true understanding of a spoken phrase.

As I personally used to visit L&H labs frequently, I know quite well what you may expect from SIRI. And in my opinion, it is best to compare SIRI to an eight year old. The keywords are clear speech, and above all: patience. If you keep those two in mind, you’ll save yourself much aggravation.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s