Tag Archives: voice recognition

How technology companies are improving voice recognition software

While voice recognition software has certainly improved in the two decades, it hasn’t exactly been the blockbuster tech that Ray Kurzweil predicted. My first experiments with the technology were playing around with Microsoft’s Speech API (circa Windows 95) and early versions of Dragon Naturally Speaking. Both were interesting as “toys” but didn’t work well enough for me to put them to practical use.

Seriously, everything I tell a computer turns into something like this.
Seriously, everything I tell a computer turns into something like this.

Since then, I’ve tried out new voice recognition software every few years and always came away thinking “Well, it’s better. But it’s still not very good.” For fans of the technology, it’s been a slow journey of disappointment. The engineering problems in building practical voice systems turned out to be much harder than anyone thought they’d be.

Human languages don’t follow the strict rules and grammar of programming languages and computer scientists have struggled to build software that can match the intention of someone’s speech to a query or action that can be accurately processed by software. Building code that understands “What is the best way to make bacon?” (Answer: in the oven, just saying.) in the countless number of ways a human might ask the question has been challenging.

Collect ALL the data

One of the tactics used by software engineers to figure out how to handle varied types of input is to build a database of potential input (in this case, human speech) and try to find common threads and patterns. It’s a bit of a brute-force approach, but helps engineers understand what types of input they need to build code for and in cases of similar input, helps reduce the overall code needed.

If you can code your software to know that questions like “What’s it going to be like outside tomorrow?” and “What’s the weather supposed to do?” are both questions about the weather forecast, processing human speech becomes a little easier. Obviously you wouldn’t want to (or even be able to) build code for every possible input, but this approach does give you a good base to build from.

In the past, capturing the volume of data needed to do a thorough analysis of speech wasn’t really an option for voice recognition researchers, cost being a major factor. It was simply too expensive to capture, store, and analyze a large enough sampling of voice data to push research forward.

Leveraging scale

Over the last few years, lowered costs of storage and computing power paired with a much larger population of internet users has made this type of data collection a lot cheaper and easier. In the case of Google (and to some extent Apple), features like Voice Search probably weren’t initially intended to be products in and of themselves, but capture points for the company to collect and analyze voice data so that they could improve future products.

Analysis of a massive database of voice data paired with what are likely some very smart algorithms helped Google build their latest update to Voice Search. For the more scientifically minded, Google’s research site has a lot of interesting information on this analysis work. And as you can see from the video, the results are impressive.

Welcome to the future

For people who have been following voice recognition, the recent uptick in progress is very exciting. Now that the field has gained some momentum, development will likely advance at a rapid pace. The “teaching” component of these systems will improve, enabling them to decipher natural language without human help and more products will include voice interfaces. It’s been a long time coming, but it’s finally starting to feel like the future.

Two Great Alternatives to GOOG-411

Did you know that Paranormal Activity 2 is Playing at AMC Kent Station in Kent, WA at 10:00pm and 12:15am tonight; it is currently clear and 46 degrees in Cando, ND; regular gasoline is $2.51 at the Sinclair station at 800 W Hampden Ave in Sheridan, CO; and today is a 9 out of 10 day for anyone whose astrological sign Cancer?

Why do I know all of this? Is it because I have way too much time on my hands? That, or maybe I’ve been on the phone with Bing 411 for half an hour, or perhaps both.

As I’m sure most of you have heard, Google is shutting down Goog411, its directory-assistance service that uses voice recognition to connect callers to businesses, on November 12.

Instead of leaving you alone in the cold, dark world, I’ve put together a list of a few alternates to the service that you can use once Google has pulled the plug.

Bing 411 (1-800-BING411 / 1-800-246-4411)

The first alternative I tried to Goog-411 was Bing 411. Now, the reason I spent half an hour on the phone with Bing411 was because of the large amount of options offered. When you first call  Bing411 you are given the option to “Tell me my choices” to which Bing411 replies with Driving Directions, Traffic, Weather, Movies, Sports, Stock Quotes, Cheap Gas, Horoscopes, News, Time, Travel, and Favorites.  That is more than a few options. I’ll give you a brief run-down of what each option will do for you.

Driving Directions

Pretty straight forward, say your current location and your desired destination and Bing411 will give you step by step directions via voice on your phone or, should you want it, a text message.

Traffic

Like Driving Directions Plus. Say your location and desired destination. Bing411 will then give you a few different routes with traffic density and estimated travel time.

Weather

The weather option is really what you would expect from a weather option. It tells you the weather.

Movies

This was probably the most in depth option I found on Bing411.  You have the option of searching for a movie, or a theater. Searching for a movie gives you a list of theaters currently showing it. Choosing one gives you the option of connecting to it by phone or being sent the information by text. If you search for a theater near you, you’re given all the theaters in your area. Choosing one will give you all the movies playing and their play times.

Sports

I was pretty disappointed with the sports option. I’m not sure what I was expecting, but when I searched for the Green Bay Packers all Bing 411 gave me was the result of their last game and date and time of their next one.

Cheap Gas

When I chose cheap gas for Denver, CO , I received a list of gas stations with their prices of regular unleaded gas.

Horoscope

Give Bing your sign and get a prediction for the day. No word on how accurate it is though. Today is supposed to be a 9/10 day for me so let’s hope for the best.

News

Pick news and you get a broad range of choices such as top stories, technology, sports, etc. Pick a category and you’ll get an update on news in that area. Searching technology got me  a story about police in England using social media sites to track criminals.

Free 411 (1-800-FREE411 / 1-800-373-3411)

Free 411 was the only other voice recognition 411 number I could find. The amount of options are quite a bit fewer than Bing. The options are to search for Government, Business, Residential, Weather, and Horoscopes.

The Government, Business, and Residential options just lookup the name/address of whatever your are looking for. After you find the location you’re looking for you can have the address/phone number texted to your phone.

The weather option gives you basic current weather in any city and state.

Horoscope was very similar to Bing’s just gives you a prediction. Again I can’t say much for the accuracy. I’ll let you know if before the night is done an unexpected lover sweeps me off my feet.

An interesting feature I found with Free 411 was if you call it back after you’ve made a search it will give you the option to “repeat last number” in case you forgot the one you just got.

Conclusion

These are the only two voice recognition services available that I could find as suitable replacements for GOOG-411. Of the two, Bing411 has way more options and a lot more depth and features. Free411 is good especially if you’re just looking for directions to somewhere.

As far as their ability to recognize what I was saying, Free411 seemed to pick up what I was saying a little better. Also it was a lot more responsive. Bing wouldn’t let me interrupt it very easily. It seemed like I had to listen to a certain amount of an option before I could say something and skip the rest.

As far as ads are concerned, Bing had significantly fewer. It seemed that Bing’s ad scheme was set up on a timer and every so often you’d get one. On Free411 you got an ad when you started and when you finished a search and every time you started over.

They are both useful if you’re looking for a 411 type voice service. Personally, I was  interested in the various text based services such as previously covered Google SMS (Text 466453) or Yahoo! SMS (Text 92466) search. These allow you to do basically everything that their voice counterparts to, but just with a simple text. For example, text Sushi 94040 and receive the name and address of all the sushi joints close to that zipcode.

Have an alternative to GOOG-411 I didn’t cover? Post it in the comments below!