Curator of AIPerspectives.com which contains a free, 400-page AI 101 book.
Sure, we can ask Siri or Alexa to answer a question or perform an action for us. But Siri and Alexa can only respond to pre-programmed questions and commands.
They do not really understand what you are saying and you cannot have a real conversation with a personal assistant like you can with another person.
Three-year-old children understand language. We have computers that can beat chess champions. Why is building computer systems that understand natural language so difficult? (Natural languages are the languages that people speak as opposed to computer languages).
It’s natural to think that the meaning of a sentence is the composite of the individual meanings of the words in the sentence and the meaning of a paragraph is the composite of the meanings of the sentences in the paragraph.
The principle of compositionality was first put forth by philosopher Gottlob Frege in 1882 and states that the meaning of a sentence (or text) is the meanings of the individual words plus the syntactic rules for combining the word meanings.
But this literal meaning is just the tip of the iceberg in human understanding. Language understanding involves far more than knowing the dictionary meaning of the words and applying grammatical rules.
People have a great deal of knowledge about the world that is used to understand natural language by inferring the implied meaning of natural language utterances. Some examples of world knowledge include:
Entities. We know a lot about entities – people, places, and things. We know facts about Barack Obama, Tiger Woods, Paris, London, the Taj Mahal, and the Super Bowl, and much more. As importantly, we know how to find information when we need it.
Concepts. We know that German Shepherds are dogs, that dogs are mammals, and that mammals are animals. We also know that dogs have four legs, a tail, and bark (usually).
Relationships. We know a lot about relationships between entities. We know that Orson Welles produced the movie, Citizen Kane. Or, if we don’t know it, we know how to look it up.
Events: We know a lot about events such as the crash of the Hindenburg.
Numbers. We know that 10 is more than 5. We understand fractions, percentages, and currency.
Geography. We can picture in our heads the relative locations of cities, states, countries, bodies of water, and mountains.
Time. People understand clocks, calendars, the number of hours in day, and birthdays.
Aging. People know to expect different cognitive capabilities and behaviors from a newborn, a toddler, a child, a teenager, an adult, and a senior.
Scripts. We know the typical patterns of events of eating in a restaurant, buying goods in a store, or lending money to a friend.
Daily living. We know how to eat, bathe, and use a cell phone.
Psychology. We know about emotions, moods, relationships, attitudes, and beliefs.
Physics. We know that if we drop a glass, it will fall, contact the ground, and shatter into pieces. We have at least a rudimentary understanding of concepts like gravity, friction, condensation, evaporation, erosion, elasticity, inertia, support, containment, light, heat, electricity, magnetism, conduction, and many more principles and concepts that can be collectively termed “intuitive physics”.
Biology. We understand that people and most animals need to eat food, breathe, sleep, and procreate. We know that lions eat antelopes, birds eat worms, and small fish eat plankton.
Statistics. We know that if we roll a dice, on average we’ll get the same numbers ones, twos, threes, fours, fives, and sixes.
Rules of thumb. We know that most but not all dogs bark. We know that most but not all birds fly. We know to avoid snakes and alligators.
Visual information. When people are asked the shape of a German Shepherd’s ear, most report visualizing a German Shepherd and inspecting their mind’s eye for its shape.
Space. We know that the world exists in three dimensions and can process references to “above”, “near”, and “to the left of…”.
Numbers: We understand quantities, currencies, and many other types of numbers.
Math: We know how to perform mathematics on numbers.
Procedures: We know many procedures. For example, “First, jack up the car. Then take off the old tire. Then replace it with the new tire.”
Emotions: We understand anger, fear, joy, sadness, and many other emotions.
Moods: The mood of the speaker (e.g. cheerful, irritable, depressed)
Attitudes: The beliefs, preferences, and biases of the speaker
Personality: The speaker’s personality traits (e.g. nervous, anxious, jealous)
Causality: We understand cause-and-effect relationships. For example, if I turn on a light switch, the light will turn on.
Specialized knowledge. A banker has specific knowledge of banking. A pediatric ophthalmologist has specific knowledge of childhood eye diseases.
Understanding Language Requires World Knowledge
Even children make extensive use of world knowledge in understanding language. For example, consider this statement:
The police officer held up his hand and stopped the truck
As psychologists Allan Collins and Ross Quillian pointed out, people’s understanding of this sentence goes far beyond the literal meaning and includes a great deal of implicit meaning. The understanding of this sentence includes facts such as:
- Cars have drivers
- People obey police officers
- Cars have brakes that will cause them to stop
- Drivers can step on the brake to stop the car
Even an eight-year-old would include these bits of world knowledge in their understanding of the sentence.
In contrast, consider this very similar statement:
Superman held up his hand and stopped the truck
Our understanding of this sentence is very different. Here, we draw on our knowledge of a science fiction character and we understand that Superman applied physical force to stop the truck.
Our understanding of these two sentences goes far beyond the meanings of the individual words plus the grammatical rules (which are basically the same in both sentences).
Similarly, if we hear someone say
I like apples
We know they are talking about eating even though eating was never mentioned (Schank, 1972). If you hear that
John lit a cigarette while pumping gas
our commonsense knowledge tells us that this is a bad idea and we expect the next sentence to tell us whether there was an explosion.
Along a similar vein, consider the above examples from the book “Disorder in the Court”. In each of these examples, we laugh because the witness misunderstood the attorney’s question. But think about how much world knowledge we had to apply to correctly understand the attorney’s intent. In the first example, the word “gear” has multiple meanings and the witness chose the wrong meaning. But without any other context, most of us reading the attorney’s question will infer that this was a court case that had something to do with a car accident.
We could go on ad infinitum about all the world knowledge we need and all the inferences we need to make in order to understand the attorney’s questions in each example.
People take language understanding for granted. To understand natural language, people must make use of all their world knowledge and reason based on that world knowledge. Despite the fantastic advances in artificial intelligence, we still have no idea how to build this world knowledge and these reasoning capabilities into computers. We also have no idea how to teach computers to acquire this knowledge on their own.
Also published here.
Create your free account to unlock your custom reading experience.