Say It Ain’t So: Speech Technology Still Has Long Way to Go

Speech recognition technology, once thought to be relegated to the realm of science-fiction, has become a reality thanks to the advances of many new devices people commonly use.

Perhaps you have a smartphone or tablet that utilizes some form of speech technology. Apple made waves when they first introduced Siri several years ago. Microsoft has created Cortana to be a virtual assistant for their products. Google has their own form of the technology with Now. All of these items allow people to communicate with their devices by simply speaking with them.

Considering the advancements that have been made in a few short years, you may reasonably assume we’ll soon see voice recognition nearly everywhere. That may not be the case, however. Despite the progress that’s been made, speech recognition technology still has a way to go before it overcomes some significant challenges and obstacles to reach true mainstream prevalence.

It’s clear mainstream acceptance isn’t inevitable in part because the technology hasn’t reached the desired proficiency yet. Think about the times you’ve spoken to Siri or Cortana, only for those virtual assistants to either misinterpret what you said or need you to repeat it. This happens far too often for speech recognition technology to truly take off.

Recent improvements have shown clear and impressive progress. Back in 2013, Google Now had a 23 percent error rate when it came to recognizing words. Just two years later, Google announced that they had improved the technology, leading to an error rate of only 8 percent. Yes, that’s a significant upgrade over the previous rate, but 8 percent is still too high. People use new technology because it is convenient and makes life easier. If the technology gets what you say wrong almost one out of every five times, that’s not a high enough success rate.

Other problems have plagued speech recognition technology for some time. Environmental noises still pose a major problem. This can be most easily seen when speech recognition is used in some of the latest car models. Driving down the freeway while trying to use voice commands in your vehicle just doesn’t work sometimes. The technology also runs into problems the farther away you are from the microphone. But one of the most significant issues is one that many users don’t think about.

This issue has to do with response time. Whenever you speak to a virtual assistant, that information has to be transmitted to a central database, processed, and sent back before an answer or response is given. Many of these systems are cloud-based, and the long latency could be a problem when users need an answer quickly. When used for menu navigation, for example, users that manually click buttons and touch the screen know the response is immediate. With speech recognition, there might be a delay, sometimes by several seconds. While it may be slight, that delay may lead to people choosing to skip speech recognition entirely.

These problems may be formidable, but they’re not insurmountable. As seen in the improvements in the error rate, technology is constantly progressing. Advances in machine learning, deep learning, and hardware have made speech recognition a more viable option for devices.

Many experts are even going further in trying to improve the technology. Some researchers have even found success in getting computers to start reading lips, which would solve some of the problems highlighted above. Advances in private cloud solutions have also made a jump, effectively cutting the latency of response times. Other developments have seen the sharing of computing resources to limit the response time, with some processes taking place locally while others are handled in the cloud.

Even with these developments, speech recognition technology isn’t quite ready to become ubiquitous. Users are still waiting for it to truly be a more convenient way to interact with their devices. There’s also a certain social stigma that has to be overcome to talking to our devices out in public.

Despite the numerous obstacles, speech technology will likely become mainstream; it just might take more time than first expected.