• mobile_menu_1_icon


    • Certifications

    • Programs

  • mobile_menu_2_icon


  • mobile_menu_3_icon


  • mobile_menu_4_icon


Mobile Header Background

The 3 Next Steps in Conversational AI

By John White Last Updated on Jun 15, 2021

Conversational AI is a subfield of artificial intelligence attentive on producing natural and seamless conversations between humans and computers. We’ve seen numerous amazing advances on this front in recent years, with important improvements in automatic speech recognition (ASR), text to speech (TTS), and determined recognition, as well as the rocket ship growing of voice assistant devices like the Amazon Echo and Google Home, with approximations of close to 100 million devices in homes in 2018.

Then we’re still a long way away from the easy human-machine conversation promised in science fiction. Here are certain key advances we should see over the next era that could get us closer to that long-term vision.

New tools beyond machine learning

Machine learning, and in particular deep learning, has become an extremely popular technique within the field of AI over the past few years. It has already fueled significant advances in domains such as facial recognition, speech recognition, and object recognition, leading many to believe it will solve all of the problems of conversational AI. Though, in reality, it will be only one valued tool in our toolbox. We’ll need additional techniques to manage all aspects of a real human-computer conversation.

machine learning

Machine learning is particularly well suited to problems that involve finding patterns in large corpora of data. Or as Turing Award winner Judea Pearl pithily said, machine learning fundamentally resolves to curve fitting. There are numerous problems in conversational AI that map well to this type of solution, such as speech recognition and speech mixture. The technique has also been practical to determined recognition (taking a textual sentence of human language and changing that into a high-level account of the user’s determined or desire) with good success, though there are some limitations in using this technique to detention meaning from natural language, which is integrally stateful, sensitive to context, and often vague.

Suggested read: History of Artificial Intelligence

However, there are surely problems in computer conversation that are not as well right to machine learning. Think of human-machine conversation as being poised of two parts:

Natural language understanding (NLU) — understanding what the user said

Natural language generation (NLG) — expressing a reasonable and on-topic response to the user.

Much of the consideration of late has been absorbed on that first part, but many tests are remaining on the group side, and these tend not to be well suitable to machine learning because response generation isn’t simply a product of gathering and analyzing lots of data. The challenge of maintaining a believable, ongoing, and stateful conversation will require more focus on these NLG and dialog management parts of the problem over the coming years.

Higher fidelity experiences

Conversational experiences today can be quite humble and forced. To move outside these limitations, we will need to provide higher loyalty conversations. There are numerous parts to achieving this, including:

  1. Wide and deep conversations. Most conversational experiences today are either very comprehensive but narrow (e.g., “What’s the time?” => “The time is 9.45am”) or very narrow but profound (e.g., a multi-turn conversation in a quiz game). To advance beyond these incomplete experiences, we will need to get to a world of together wide and deep conversations. This will need a much better understanding of the setting of a user’s input to be able to reply suitably, robust tracking of the state (memory) of a conversation, as well as the aptitude to scale beyond the present technical limitations of identifying between only a few hundred intentions at a time. Also read: How AI differs from ML
  2. In a natural conversation between two people, all will usually draw on previous experiences with the other converser and will tailor their replies to that person. Computer conversations that don’t do this incline to feel unnatural and even irritating. Speaking this in the long term will require solving trials such as speaker identification so that the computer knows who you are and can reply inversely to you versus someone else. The additional aspect will be tracking state for preceding conversations and being able to respond otherwise over time, such as learning the favorites or style of the precise user.
  3. Multimodal input and output. Presently, conversational AI focuses on understanding spoken inputs and creating spoken responses. Though users could deliver inputs in many different ways, and outputs could be created in different forms too. For example, a user could press a button on a screen in adding to providing a spoken input. Or sentiment analysis could be used to deliver an emotional-level input that the computer can respond to. Subsidiary multiple inputs or outputs at the same time opens up a range of complexities that essential to be considered. For example, if the user says “No” while persuasive a “Yes” button, what should the system do?