Google Touts Voice Recognition Improvements for Search and Dictation

26 Sep 2015 | Author: | No comments yet »

Google improves voice search (again) for its mobile apps.

Google GOOG 0.55% is claiming better voice search on its Android and iOS mobile apps, thanks to a new approach to the artificial intelligence technique the company uses to power that capability. A blog post published on Thursday, authored by a handful of Google researchers, explains in technical detail how they pulled off the improvements, which include faster, more-accurate transcriptions and better voice recognition in noisy places.

And yet, Google might have the best voice recognition algorithms of them all: it can recognize even mumbled input and does it with almost no processing delay. If that doesn’t make much sense to you, here’s a straightforward explanation of how it works: In a traditional speech recognizer, the waveform spoken by a user is split into small consecutive slices or “frames” of 10 milliseconds of audio. In the old model, the system would analyze 10-millisecond snippets of audio and make predictions of words based on the sounds it recognized, regardless of the order in which they were uttered. In a new post on the Google Research Blog, members of the Google Speech Team have set out the latest developments in the company’s voice search algorithms. RNNs are one increasingly popular approach to doing deep learning, a type of artificial intelligence, and Google is widely thought to have a deep bench in deep learning.

Google had already been employing deep neural networks — the same stuff responsible for those freaky distorted pictures — to compute the most likely thing you’re trying to say to your phone, but now it’s evolved its approach and started using recurrent neural networks. For example, the word “museum” is broken up into / m j u z i @ m/ in phonetic notation and normally the sounds made by “j” and “u” would be difficult to separate. The new voice modelling allows Google to account for temporal dependencies, which is to say that it’s now better at analyzing every snippet of audio by referring to the sounds on either side of it. RNNs have feedback loops in their topology, allowing them to model temporal dependencies: when the user speaks /u/ in the previous example, their articulatory apparatus is coming from a /j/ sound and from an /m/ sound before. Baidu’s Andrew Ng, who is known for his work on the so-called Google Brain, last year predicted that within five years “50 percent of queries will be on speech or images.” “In addition to requiring much lower computational resources, the new models are more accurate, robust to noise, and faster to respond to voice search queries — so give it a try, and happy (voice) searching!” wrote Sak, Senior, Rao, Beaufays, and Schalkwyk.

It’s all very complicated stuff from a computer science perspective, but is increasingly important to our everyday lives as we expect everything from our phones to our cars to be more intelligent. In fact, Google claims that it makes voice search far more accurate, particularly in noisy environments, as well as helping to make it “blazingly fast.” You don’t even need to do anything to take advantage of the improvement: The new neural network approach is already being used by the Google search app for iOS and Android. You’ll get to keep your current user name (as long as it doesn’t contain invalid characters, in which case you’ll have to go through a few extra steps to make the transfer), and all your old comments will eventually (not immediately) migrate with you.

If you want to learn more about how deep learning, the umbrella term for this collection of techniques, works, read Fortune‘s recent interview with Andrew Ng, the chief scientist at Chinese search engine giant Baidu BIDU 0.17% and a renowned expert in the space.

Here you can write a commentary on the recording "Google Touts Voice Recognition Improvements for Search and Dictation".

* Required fields
Our partners
Follow us
Contact us
Our contacts

ICQ: 423360519

About this site