Behind the Tech: How does Siri, Alexa or Google Home respond to your questions?

Ever wondered how Apple Siri, Amazon Alexa or Google Home responds to your questions?

How does a computer understand human language which sometimes human themselves do not understand?

Well, technically it is known as Natural Language Processing.

Natural Language Processing (NLP)

The field of study that focuses on the interactions between human language and computers is called Natural Language Processing, or NLP for short. It sits at the intersection of computer science, artificial intelligence, and computational linguistics (Wikipedia).

Behind the tech: NLP (Basics)

Disclaimer: There are multiple techniques to process a Natural Language, I will be shredding lights only on some.

Let me tell you an instance about Nidah.

Nidah is 2 yrs old & she doesn’t know any of the languages. But when her mom asks her to drink milk, she drinks it!

So how does Nidah processes the information from the language which she doesn’t know?

Nidah’s mom must have shown her some hand signs or she must have trained Nidha that when she says “Nidah please drink milk”, Nidah has to drink milk. So Nidah understood it, processed it & also stored that information for future use. This entire workflow got finished as Nidah was trained. If her mom would have asked her to do something which Nidah wasn’t aware(trained) of then Nidah wouldn’t have processed the information.

Now Looking into Nidah’s example, there were 2 stages of processing an information

  1. Training Stage (Pre-processing Stage)
  2. Process + Decision-Making Stage

Training Stage

Let me take a look into Nidah’s mom’s statement again.

“Nidah please drink milk”

Question 1: How did Nidah understand what task to perform?
Question 2: How did Nidah understand that she has to drink milk, not coffee?

Simple, while training Nidah you tell her that “drink milk” has a verb “Drink” & has a property “Milk”. Now Nidah knows what task to perform & what to drink. I would call her “Trained Nidah”. (Technically the trained data is called “Data model”)

Similarly, in the field of NLP, the verb “drink” is called “Intent” & the property “milk” is called “Entity” & method of processing the data in this fashion is known as: Named Entity & Intent Recognition.

An Entity is a property which can be used to answer the request from the user — the entity will usually be a keyword within the request such as a name, date, location etc.

An Intent(Intention), is a specific action that the user can invoke such as drink, place an order, book a ticket etc

There are a bunch of NLP tools like Google Dialogflow, Facebooks, IBM Watson or Microsoft’s LUIS where you can host your “trained data model”. Let me call it the “BRAIN”. These brains even provide data processing & decision making abilities.

Process + Decision-Making Stage

Processing looks simple from outside but it undergoes a series of a pipeline of text processing sub stages like Stemming and Lemmatizing, TF-IDF, Coreference resolution, Part-of-speech (POS) Tagging, Dependency Parsing, Named Entity/Intent Recognition & many more.

With the evolution of 3 party NLP tools, developers need not worry about knowing many of the processing sub stages as these tools take care of it & make your life easier.

Now going back to the Nidah’s example, if you have already trained “Brain” (3rd party NLP tools) & once the brain receives a message like “Nidah please drink milk”, the brain will process the message & understands that the intent is “to drink” & entity is “Milk”.

But how the brain will know what action needs to be taken?

Well, you can even train the brain based on the intent received that it has to drink milk or utter an acknowledgment “ok”.

read original article here