Google Tweaks Voice Search On Android To Make It Faster And More Accurate

25 Sep 2015 | Author: | No comments yet »

Google Improved Its A.I. Model For Voice Recognition.

Of all the big companies that use voice controls and interaction in their software, Google is perhaps the most low-key about it. Google GOOG 0.55% is claiming better voice search on its Android and iOS mobile apps, thanks to a new approach to the artificial intelligence technique the company uses to power that capability.The company explains that it’s now built “better neural network acoustic models using Connectionist Temporal Classification and sequence discriminative training techniques” to upgrade its voice search functionality.Voice recognition is becoming an integral part of current mobile devices as we use it to activate several functions, the voice transforms into text so that it can be sent as a message and the digital assistants use it constantly to answer every question. A blog post published on Thursday, authored by a handful of Google researchers, explains in technical detail how they pulled off the improvements, which include faster, more-accurate transcriptions and better voice recognition in noisy places.

In addition to improving voice search, Google says its aural wizardry also makes dictation on Android devices better, while requiring far less computing power. Today Google reported that their widely-used voice search capability is now being handled by a new engine that recognizes and anticipates words with a much higher degree of accuracy. Instead of how other voice search features such as Siri works, Google will actually try to parse every word you say in real-time, meaning that as you speak you should be able to see your query being written out. In 2012, Google changed the Gaussian Mixture Model (GMM) that had been used for over 30 years for a new standard called Deep Neural Networks (DNNs), which provided better results for sounds produced by users at any moment and the accuracy of speech recognition was also improved. And yet, Google might have the best voice recognition algorithms of them all: it can recognize even mumbled input and does it with almost no processing delay.

In the old model, the system would analyze 10-millisecond snippets of audio and make predictions of words based on the sounds it recognized, regardless of the order in which they were uttered. Adding recurring neural network functionality to the system has allowed it to more accurately identify complete words instead of individual snippets of sound. That being said, there is always room for improvement and the good news is that for users of Google’s voice search, the company has announced that they have made some improvements to the feature, in which it will now be faster and more accurate.

These new models are extensions of a sort of artificial intelligence called recurrent neural networks (RNNs), but they will provide more accurate results, particularly when there’s noise in the background, plus the speed of voice recognition has also been improved. In a new post on the Google Research Blog, members of the Google Speech Team have set out the latest developments in the company’s voice search algorithms. For example, the word “museum” is broken up into / m j u z i @ m/ in phonetic notation and normally the sounds made by “j” and “u” would be difficult to separate.

RNNs have feedback loops in their topology, allowing them to model temporal dependencies: when the user speaks /u/ in the previous example, their articulatory apparatus is coming from a /j/ sound and from an /m/ sound before. The CTC models allow for the recognization of phonemes without making a prediction every instant, it works by taking larger audio chunks so less computations are made and thus making a faster recognition. The new voice modelling allows Google to account for temporal dependencies, which is to say that it’s now better at analyzing every snippet of audio by referring to the sounds on either side of it. The type of RNN used here is a Long Short-Term Memory (LSTM) RNN which, through memory cells and a sophisticated gating mechanism, memorizes information better than other RNNs.

Then, a problem was found, as there was a delay of about 300 milliseconds that was discovered in the way the model recognized the phonemes, so they had to train the model to predict the phonemes in a closer time of speech. It’s all very complicated stuff from a computer science perspective, but is increasingly important to our everyday lives as we expect everything from our phones to our cars to be more intelligent. At first, it seems to recognize all sorts of audio input and by the end of the video, each phoneme representation is separated and aligned where it belongs.

You’ll get to keep your current user name (as long as it doesn’t contain invalid characters, in which case you’ll have to go through a few extra steps to make the transfer), and all your old comments will eventually (not immediately) migrate with you. If your head is spinning like Colonel O’Neill after an explanation of temporal wormhole physics, you’re not alone… and there’s a lot more where that came from. If you want to learn more about how deep learning, the umbrella term for this collection of techniques, works, read Fortune‘s recent interview with Andrew Ng, the chief scientist at Chinese search engine giant Baidu BIDU 0.17% and a renowned expert in the space.

Here you can write a commentary on the recording "Google Tweaks Voice Search On Android To Make It Faster And More Accurate".

* Required fields
Our partners
Follow us
Contact us
Our contacts

ICQ: 423360519

About this site