Ruhani Rabin
4 min readJul 10, 2018


The history of speech recognition can be traced back to 1961 when IBM introduced the first ever speech recognition system dubbed as Shoebox that was able to recognize 16 words and digits 0 through 9. It was termed as Shoebox because it was approximately the size of a standard American shoebox. But the speech recognition technology has completely evolved over the decades.

Today, we have voice-activated assistants, which are available in the form of both as a standalone device like Amazon’s Alex and as an app like Google Assistant, which can be installed on smartphones. We have Siri, which comes with iPhone smartphones. Cortana is another example of voice-activated virtual assistant that comes with Microsoft Windows OS.

But the technology of voice-activated assistant that recognizes a user’s voice command and performs actions is now being prepared for the next level. Researches and scientists around the world are working on possibilities of blending artificial intelligence with the voice recognition for better results.

The blend of Voice Recognition and Artificial Intelligence can help in solving crime cases

Let me begin it with one very practical example I found in the news. Researchers at the University of East Anglia are working on the visual speech recognition technology that could help in solving crimes and provide communication assistant for people with hearing and speech impairments.

The technology can also be used for video captured via CCTV cameras that do not often are able to record recognizable audio. The speech recognition system is powered by AI technology to reconstruct the conversation. The developers of this AI enabled voice recognition system are training the machine to identify the appearance of humans’ lips and how words and sentences are formed.

Conversational User Interface

Conversational User Interface (CUI) with verbal interaction is the core of the latest AI development. So far many applications or products are simply “Mechanical Turks, which pretend to be fully automatized, but there is already a hidden operator who does most of the things, such as taking inputs from users and commanding the machine to perform certain operations. Here speech recognition can play an important role.

There are many interesting advancements happening in speech recognition and voice recognition technologies in the tech world.

AI-based deep learning technologies are augmenting the abilities of bots with respect to traditional natural language processing. They are pushing the concept of conversation-as-a-platform, which is doubtlessly disrupting the app market.

Most of the tech companies have strictly been focusing on the visual interfaces that have buttons, dropdown lists, sliders, carousels and several others visual effects. But now the focus is shifting towards conversational interfaces that take users’ commands in the form of voice.

AI’s subset, Machine Learning can make bots do better with voice recognition

Organizations around the world are advancing themselves by adopting AI and ML powered chatbots to answer customers’ queries with the relevant information. But the next wave of chatbots will have enhanced real-time data analytics and automation capabilities to engage customers and reply them for their queries using with the use of voice.

Think about how we contact various service providers’ customers care desks. In almost all cases, we encounter with a machine playing messages and asking to press 1 for a specific requirement and 2 for another. Then, upon pressing a specific number for choosing a specific option, it may again offer several subsets for a query and ask customers to choose one of them.

However, this approach may help companies cut the cost spent on maintaining a large customer care team, but at the customer end, it’s quite frustrating to navigate these menus and get help. In many cases, this approach doesn’t actually help customers and force customers to disconnect the call.

Now imagine if a chatbot can talk to customers, understand their queries in a natural language processing, and then suggest them with the best possible solutions. The chatbot can also perform security checkups in several ways like by voice recognition and knows everything about the all your interaction with the company, including, past conversations in other sessions, past orders, failed or returned orders, searches performed on the website, and everything that help it deliver spot-on results to the queries of customers.

Infusing AI in picking a single voice out of a crowd

It has been challenging for a voice-activated assistant to listen to a single sound in a crowd where several people are speaking at once. Devices like Amazon’s Alexa and Google Home are designed to deal with requests from a lone person, but they cannot accurately pick the voice of a particular person from a crowd and respond to his queries.

But now we have a solution for this problem in the form of AI, which is able to separate voices of multiple speakers in the real time. In 2017, Researchers at Mitsubishi Electric Research Laboratory in Cambridge, Massachusetts demonstrated an AI system that uses machine learning techniques termed as the “deep clustering” by the development team to recognize unique features in different voices and voiceprints of multiple speakers.

Author Bio: Sofia is a digital marketing expert in Rapidsoft Technologies which is a leading IT consulting company providing full range it services including, IoT app development, Blockchain development, and big data app development solutions.

Originally posted here:



Ruhani Rabin

Ruhani Rabin being a tech and product evangelist for almost 20 years. He was VP, CPO for various digital companies. Plays with Drones in his free time.