People often talk about artificial intelligence as if it were all one thing. It’s more accurate to think of AI as a collection of approaches to problems that can’t be solved algorithmically. An AI system usually combines several approaches, doing whatever produces the best results. Breaking down the varieties of artificial intelligence demystifies AI a bit and gives some insight into how chatbots and other software work.
The problem domain
The design of a piece of software needs to decide what problem domain it will cover. In other words, what class of problems will it deal with? The narrower the domain is, the easier the job. Many early AI efforts dealt with strictly limited domains, such as board games.
In chess, only a handful of moves are possible at any time. There’s no ambiguity. A knight can’t straddle two squares or jump to the other end of the board. This doesn’t mean the problems are easy; winning requires looking ahead several moves, which creates a huge tree of possibilities. But the set is finite for a given number of moves, and enumerating it is a simple task.
Once AI tries to understand natural languages like English, the domain isn’t as well delimited. People can use words that aren’t in the software’s vocabulary and engage in odd syntax. If they’re giving spoken input, they can speak indistinctly or with an unfamiliar accent.
However, what the software expects users to talk about is normally limited. A chatbot for car parts isn’t expected to talk sensibly about vacation spots or meal planning. Most AI today deals with just one area of knowledge at a time.
Alan Turing proposed that a machine that could hold an unrestricted conversation and be taken for a human being could be said to “think.” Conversational software has fooled some people in this way; Turing never said how many people had to be fooled or for how long. A software application that could consistently be taken for a human being (allowing for restrictions on the mode of communication) would have to be a true generalist, taking all kinds of knowledge as its domain. That would start raising difficult questions about the status of the machine — or is it the software? — as a sentient being. So far, though, such capabilities remain in the realm of science fiction.
Rules and goals
The oldest and simplest kind of AI is rule-based systems. The system’s designers define a set of rules that relate to a problem domain. In a chess system, the most important rules would be “avoid checkmate” and “checkmate your opponent.” Other rules would include “control the center of the board” and “avoid losing pieces with greater value than your opponent loses.” The coding of the rules would include algorithms for measuring their achievement.
A rule-based system is limited to what its developers can make explicit. Top chess players have an ability to evaluate moves which goes beyond what they can put into words. Really, only one explicit rule is needed: “Checkmate your opponent first.” Everything else follows from that, but it’s necessary to construct corollary rules to get there. Rule-based machine learning makes that possible.
With machine learning, the system constructs its own rules based on previous results. Its starting points are goals more than rules. The rules it formulates are hypotheses. If they work well, it keeps them; otherwise, it refines or discards them. Over time, the system builds up an ever-improving set of rules for dealing with the domain. Rule-based machine learning is better suited than explicit rule-based systems for open-ended domains.
Pattern recognition
Much of AI is concerned with identifying perceptual entities. These can be pictures, words, or physical objects. This process is called pattern recognition. One aspect is recognizing types of entities. A facial recognition system needs to identify what is a human face and distinguish it from a monkey’s face, a skull, or a cartoon drawing. There usually isn’t a rule that defines the entity. Recognition systems are “trained” by being given a large number of images of faces, covering different physical types, expressions, and lighting conditions, as well as images that might be mistaken for faces.
Pattern recognition doesn’t work from a blank slate. The programming incorporates ways of identifying features from pixels. Thus, it’s closely tied to research in human neurology. The more we understand about how we recognize patterns, the more we can apply that to computational systems.
What seems simplest is often the trickiest. How does a machine recognize the spoken word “have”? We often slide over the word, barely pronouncing it, because it is such a common, familiar word. We don’t hear just isolated words; every sentence creates expectations, letting us fill in words we miss. A pattern recognition system for speech has to learn phrase and sentence patterns as well as words.
If one interpretation of a pattern is implausible in context, a pattern recognition system will look for alternatives. If a person seems to say, “I need fuel to hate my house,” we realize that the word was probably “heat” and not “hate.” This eliminates confusion, at the cost of generating false assumptions in unusual cases.
AI has extended the concept of patterns beyond perceptual objects. In a computer security system, the monitoring software can distinguish normal traffic from a possible breach using a variety of signs. The system can not only report a probable breach but identify the type of activity and its source. In an industrial system, pattern recognition can identify inefficiencies and malfunctions that aren’t obvious from any single datum.
Support vector machines
Many problems in AI come down to “Is this an X or a not-X?” The answer may be implicit in the data, but in a complicated way. If the data points are shown on a graph, it may be clear that there is a difference between X and not-X points, but the formula for distinguishing them may not be obvious. Support vector machines – which normally aren’t separate machines, in spite of the name – find a way to transform the data so that there is a line, plane, or hyperplane that separates the two groups of points.
This approach is valuable when it’s necessary to reach a yes-no conclusion from a large amount of data. Computer security monitoring is an example. The result can assign a degree of confidence to the answer, depending on how close to the plane of separation the data points are.
Neural networks
The human nervous system consists of neurons that each deal with a small part of the overall picture. Computer scientists have adopted this approach with neural networks. The original ones were actual networks of many small processors; today’s systems use separate processes which may or may not reside on separate hardware. Either way, the processes interact heavily with one another, producing an emergent property or gestalt. This breaks the problem down into small pieces and allows for more parallelism than machine learning in a single process.
Neural networks work well for pattern recognition. One process might recognize part of a boundary. Another might identify color. Each one contributes a piece of information that could confirm or reject a hypothesis. Other units combine these results and reach or fail to reach a threshold value for accepting the hypothesis. Checking results against known cases produce feedback, adjusting the way the processes identify features and the weights assigned to them.
Natural language processing
Turing’s test for whether a machine thinks relied on its ability to understand and communicate in a natural language such as English. Natural language processing (NLP) is what accomplishes these two things; whether that constitutes “thinking” is a more philosophical question. All the methods mentioned in this article have applications for NLP.
One of the earliest AI applications to allow unrestricted English input was Joseph Weizenbaum’s Eliza. It used a strictly rule-based system and restricted its domain by playing the role of a psychotherapist. Its responses consisted largely of turning whatever the user typed into a question. If the user typed “I talked with my mother,” Eliza might respond “Tell me why you talked with your mother.” Simple as it was, some people took it seriously as a therapy tool. It was easy to get it to spew gibberish, though.
Understanding human input is the harder part of NLP, even if the input is just text entered at a keyboard. People don’t always use strict grammar and spelling, and they get annoyed if every error brings communication to a halt. Modern systems use rule-based learning to recognize patterns in sentences, even if they aren’t strictly correct.
Speech recognition adds the difficulties of identifying a word from all its possible pronunciations. A support vector machine can help to decide whether a sound is enough like a word to be accepted as it or not. A neural net can test a sound against several different words and find the most likely one.
Summary
What is called “artificial intelligence” is really a grab-bag of techniques for solving problems with fuzzy edges. It’s not a direct imitation of what humans do, but it combines approaches which are characteristic of living brains with methods that only a computer can do well. Is it understanding in any real sense? That’s open to argument. But it allows computers to solve problems that once seemed accessible only to human intelligence.
We use the latest AI and NLP techniques so that our chatbots will provide the most satisfying user experience. Contact us to find out more.