Machine
Intelligence
Opening
Remarks:
The Task
This effort represents an attempt to
create a human surrogate, including emotions, drives, self-awareness, a set of
moral/ethical principles, and a sense of humor. Although this model will require
extensive pre-programming, it is not to be programmed to do anything specific
except to learn and then to live. It is to be driven by "urges", moderated by
the interplay of conflicting influences within its developing "ego" and
"personality".
The Magnitude of the Task
These are very ambitious—even
grandiose—goals. The reason for beginning with a human rather than an animal
model of intelligence is that
(1) it is difficult to know what traits
are essential and what traits are discretionary, and
(2) I can try to delineate the mechanisms
that allow human beings to do what we do but I can't guess comparably well at
how (simpler) animal minds operate.
Reasons for Optimism
I am heartened by the progress being made
in emulating on desktop PCs such uniquely human capabilities as speech, cursive
handwriting, and optical character recognition, speech synthesis, machine
vision, bipedal locomotion, and natural language processing. We can't match
human capabilities in these areas yet but we're getting closer[1].
I am also heartened by the progress which has been made in defining neural
functions and in conceptually devising ways of performing them on computers. Of
course, the proof of the pudding will be in the eating.
This paper contains a few ideas about
what the mind appears to do, coupled with ideas about how the same or similar
functions might be emulated in silicon.
A preliminary analysis of
mental processes over the last few
months has led me to the idea that the mind appears to be exceedingly complex.
It appears to me that there is a rich abundance of mechanisms that make us what
we are, as opposed to one or a few underlying principles.
Approach
Emulating the human mind is a "grand
challenge" and a fascinating topic. It may be that we will not achieve the
thousandth part of what we are attempting. It may be decades or centuries before
we are successful at artificial intelligence. However, the process might yield a
lot of useful information, including, if nothing else, a measure of just how
difficult this problem is. Even if we are only able to achieve insectile
intelligence in a computer, there should be many commercial
applications
Some of the best minds on six continents
are working on this problem or upon aspects of it, and I have begun to draw upon
these efforts. It is my hope that it will be possible to identify an interest
group to pursue these topics and to integrate what is already available or
potentially available[2].
It should be a highly interesting project. (If you would be interested in one or
more aspects of this project, or in being kept up-to-date on its progress, I
would welcome your inputs or interest.)
A Brief
Review of Artificial Intelligence Research
The Forties:
The seeds of artificial intelligence
research were sown during World War II when such electronic computers as the
ENIAC, the EDVAC, the Aiken Mark I, the Differential Analyzer, and a number of
analog shipboard fire control computers made their debut. It's not hard to see
how these new "electronic brains", which could perform arithmetic and logical
calculations orders of magnitude faster than human "computresses", would fire
the imaginations of engineers and mathematicians. Could we develop thinking
machines that could outperform
humans in the mental arena, as labor saving machinery had already done in the
physical domain? As Dr. Hans Moravec put it,
"One line of research, called
Cybernetics, used analog circuitry to produce machines that could recognize
simple patterns, and turtle-like robots that found their way to lighted
recharging hutches. An entirely different approach, named Artificial
Intelligence (AI) attempted to duplicate rational human thought in large
computers."[3]
The brain was regarded as a digital
computer but, perhaps, with analog/digital circuits to accommodate control
functions. In the latter 40's and early 50's, there was great fanfare regarding
these prospects, together with concerns over what "technological unemployment',
"automation", and the new science of cybernetics would do to humanity in a
robot-run world. What would happen when we humans were no longer the smartest
people we knew? An MIT mathematical prodigy, Dr. Norbert Wiener, wrote books
called "Cybernetics", and "The Human Use of Human Beings", calling for
responsible application of this revolutionary technology. The science fiction
films of the era ("The Day the Earth Stood Still") had the robot as the master
and a specially created human emissary as its loyal servant. (Remember Jack
Williamson's "The Humanoids"?) The mind was considered to be a program that ran
in the brain, and it was thought to be only a matter of a few short years before
intelligent machines were running our factories. This was the era of the
first-generation vacuum tube computers such as the UNIVAC I, and the IBM 650 and
701. It was also the era of patch boards and analog computers. Feedback systems
and servo theory were very much in vogue.
Not to be ignored throughout the whole
period from the forties to the nineties were the continuing studies of the brain
by neurology researchers. These tended to proceed largely, though not
completely, independently of artificial intelligence
research.
The Fifties and Sixties:
This rampant optimism persisted
throughout the 50's and well into the 60's. In 1959, a Cornell Aeronautical
Laboratories psychologist by the name of Dr. Frank Rosenblatt developed the
first artificial neuron-based computer, called "The Perceptron". It was a
500-neuron, single-layer neural network and was attached to a 400-photocell
optical array[4].
Another major milestone occurred that year when Simon and Newell developed a
theorem-proving program called "Logic Theorist" which was able to prove a number
of mathematical theorems. Checkers playing programs, algebraic manipulation
programs (including symbolic integration and differential equation solving),
language translation, natural language processing, and () were all under
development during this time. OCR-A and OCR-B typing balls were offered for IBM
Selectric typewriters, and optical character recognition systems were available
to read text printed in those fonts. Simple wire-following robots that any radio
amateur could build were devised and described in Scientific American. As one
writer has put it, this was a period of "initial intoxication with cognitive
science". (As we shall see in the Section below concerning the capabilities of
the brain, the computers of the 50's were ludicrously slow and small, by a
factor of at least 1,000,000 and perhaps closer to 1,000,000,000,000, for the
implementation of human-caliber intelligence.)
In the early sixties, the U.S. Postal
Service mounted a major effort to develop optical character recognition hardware
and software. (The program was oversold at the time but by now, it has led to
advanced optical character recognition equipment that is in daily use by the
Postal Service.) Also, in the early sixties, Simon and Newell created the
General Problem Solver (GPS) as a generalized theorem proving system. Throughout
the sixties, there was a ferment of activities in all areas of artificial
intelligence. (Digital-to-analog converters probably weren't fast enough in the
sixties to do much with machine vision.)
However, by the end of the decade, the
Postal Service had discovered how difficult it was to build a machine that could
read addresses on letters. IBM had thrown in the towel on their Russian language
translation program when it became apparent that a computer couldn't translate
language without understanding it. And computers were too slow by many orders of
magnitude for machine vision, virtual reality, and speech and handwriting
recognition. While they could perform arithmetic and logical manipulations with
great proficiency, they were light-years away from posing their own problems or
understanding the real world, let alone handling the subtle nuances of
interpersonal relationships.
In 1969, Drs. Marvin Minsky and Seymour
Papert of MIT published a book entitled "Perceptrons" in which they proved that
single layer perceptron networks were, among other limitations, inherently
incapable of performing the exclusive OR function, and were a dead end. One
would think that their arguments would have been insupportable. After all, the human brain is a neural network of
incredible complexity, containing tens of billions of neurons and hundreds of
trillions of synapses. But for some reason, they were sufficient to derail
neural network research for 15 years. (The authors would later explain that
neural networks were competitors for research money.) Such is the power of
scientific snobbery.
The Seventies:
In the early 70's, researchers at
Stanford and MIT began mounting TV cameras and manipulators on wheeled robotic
carts and turning them loose in real-world environments. To quote Dr. Moravec
again,
"What a shock! While the pure reasoning
programs did their jobs about as well and about as fast as a college freshman,
the best robot control programs took hours to find and pick up a few blocks on a
table, and often failed completely, a performance much worse than a six-month
old child. This disparity, between programs that reason and programs that
perceive and act holds to this day. At Carnegie Mellon University there are two
desk-sized computers that can play chess at grandmaster level, within the top
100 players in the world, when given their moves on a keyboard. But present-day
robotics could produce only a complex and unreliable machine for finding and
moving normal chess pieces.
"In hindsight, it seems that, in an
absolute sense, reasoning is much easier than perceiving and acting—a position
not hard to rationalize in evolutionary terms. The
survival of human beings and their ancestors has depended for hundreds of
millions of years on seeing and moving in the physical world, and in that
competition, large parts of their brains have become efficiently organized for
the task, but we didn't appreciate this monumental skill because it is shared by
every human being, and most animals—i. e., it is commonplace. On the other hand,
rational thinking, as in chess, is a newly acquired skill, perhaps less than one
hundred thousand years old. The parts of our brains devoted to it are not well
organized, and, in an absolute sense, we're not very good at it. But until
recently, we had no competition to show us up."1
Image enhancement was a popular topic in
the 70's in support of DoD and NASA satellite image analysis and JPL's successes
with Voyager photographs. Intel introduced the first microprocessor chip: the
8008.
The Eighties:
Another False Start for
AI
In the mid-80's, artificial intelligence
enjoyed another false dawn. This time, it was rule-based expert systems, tree
searches, and Symbolics Computers. Expert systems proved hostage to the
intuition that so often guides human beings and that depends upon an overall
understanding of the world. Also, it took too long to enter all the rules into a
computer program. Expert systems still exist but they don't replace experts.
Symbolics Computer Systems soon declared bankruptcy.
Slow Patient Progress Behind the
Scenes
In the meantime, slow, patient progress
was underway. Machine vision systems began to be used for assembly line
inspections. Unimation's "Puma" robotic arms were installed to carry out
repetitive assembly line functions. Cheap embedded microprocessor chips were
becoming faster and faster. The rapidly rising capabilities of personal
computers permitted rapid programming of sophisticated software. Caere's
Omnipage Professional became an increasingly robust optical character
recognition program. Video games became ever more realistic. Though initially
very expensive, trail-blazing speech recognition systems were developed by Bell
Labs, and by many universities and small companies.
The Resurrection of Neural Networks and
Fuzzy Logic
During the 80's, a few "keepers of the
flame" had devised multi-layer neural networks that circumvented the limitations
described by Minsky and Papert. Fuzzy logic and genetic programming were added
to neural networks, which were embraced with great enthusiasm by the Japanese.
Various kinds of multi-layer neural networks with back-propagation and
sometimes, fuzzy logic, are proving to possess fascinating and highly useful
capabilities in the areas of pattern recognition and control. The latest release
(6.0) of the Omnipage optical character recognition package incorporates a
neural network to help recognize printed text. There is a great ferment of
activity in this now-highly-fashionable area of research.
Neural networks and fuzzy logic are
hot!!
Genetic programming seems to be receiving
less attention.
The Nineties:
Speech Recognition in
1990:
Computer control by voice command became
available in the early 90's: Dragon Systems for IBM-compatibles and Voice
Navigator for the Macintosh. These early speech recognition systems were
speaker-dependent and had vocabularies of a few hundred words, spoken one at a
time. (In 1991, AT&T had a laboratory system capable of recognizing
continuous speech, but it required16 parallel, 32-bit digital signal
processors.)
Speech Recognition in
1995:
In 1995, IBM began offering a voice
dictation package with their latest PCs. The IBM system is context sensitive and
can distinguish among homonyms. Dragon Systems recently introduced a
120,000-word, discrete-word speech recognition system called DragonDictate,
while Apple Computer Company is bundling a speech recognition program called
Voiceprint with its high-end 8500 and 9500 computers. IBM's "VoiceType" is the
most accurate of the speech dictation systems and includes the ability to
examine context and to distinguish among homonyms. A small company called Speech
Systems, Inc., began offering the first continuous speech, speaker independent,
voice-dictation system for personal computer owners in 1995. These systems
aren't yet the kind of Smith Corona "Voicewriter" that you'll be able to buy
from Service Merchandise sometime within the next ten or fifteen years, but
they'll get there.
Other 1995 Capabilities:
Optical character recognition (OCR) has
improved steadily with the Omnipage Professional series of OCR packages, coupled
with 600 dot-per-inch and higher resolution scanners. Handwriting recognition
has improved rapidly since Apple Computing introduced the first Newton Personal
Digital Assistant in 1992. Voice synthesis systems are also improving steadily
at AT&T. Facial recognition systems are under development, together with
usable fingerprint identification packages. Machine vision and industrial
robotics systems should be entering their heyday with cheap multi-gigops
processors such as the Texas Instruments T320C80 entering the marketplace.
These are uniquely human capabilities
that are not even shared with the rest of the animal kingdom. Interestingly
enough, they are being realized using conventional computers running
conventional software. And these programs will only improve. The introduction of
MMx processing on 80X86 processors in early 1997, coupled with ever-increasing
clock rates, wider data paths, and Intel's 1998 P7 processor, should afford a
5-to-40-fold jump in implementing these higher human functions.
However, it is important to distinguish
between systems that perform functions upon command and self-organizing systems
that give commands. What is missing from this picture are the self-aware,
self-organizing, motivated characteristics of the animal kingdom, so perhaps it
is in this arena that we might gainfully concentrate our
efforts.
Two of the most striking areas of
computer progress thus far in the 1990s are the Internet and the advances that
are being made in computer graphics.
Historical
Summary
It is clear that early AI researchers
hugely underestimated the computational requirements of artificial intelligence.
AI research has been hampered by
"Big-Endian" and "Little-Endian" arguments about whether to concentrate on
connectionist (neural net) or purely cognitive (e.g., theorem proving)
approaches to achieving artificial intelligence. In reality, the two approaches
will probably turn out to be complementary. It is never a good idea in research
to put all one's eggs in one basket[5].
Sir Charles
Sherrill's "Great Gray Ravelled Knot":
Incredible Complexity and Storage
Capacity:
It is hard not to wax eloquent when
describing the construction and the capabilities of the human brain. From a
purely computational point of view, your brain may be one to two orders of
magnitude faster and more complex then the upcoming 1 teraflops Cray T3E or
Intel Touchstone supercomputers, or perhaps 100,000 to 1,000,000 times more
elaborate than the new 200 MHz Intel P6 personal computer. Your brain contains
about 50 billon to 100 billion neurons (nobody knows how many for sure), each of
which interfaces with 1,000 to 100,000 other neurons through 100 trillion
(1014) to 10,000 trillion (1016) synaptic junctions[6].
Each synapse possesses a variable firing threshold that is reduced if the neuron
is repeatedly activated. If we assume that the firing threshold at each synapse
can assume 256 distinguishable levels, and if we suppose that there are 20,000
shared synapses per neuron (10,000 per neuron), then the total information
storage capacity of the synapses in the
cortex would be of the order of 500 to 1,000 (1015) terabytes. (Of course, if the brain's
storage of information takes place at a molecular level, then I would be afraid
to hazard a guess regarding how many bytes can be stored in the brain. One
estimate has placed it at about 3.6 X 1019 bytes.)
Not bad for a 3-pound gob of pink
goo!
Because of the neural-net organization of
the brain and the high degree of redundancy that appears to characterize
neural-net based memories, the effective storage capacity of the brain may be
much less than 500 terabytes of computer memory. My considerations of our memory
capacity suggest that its computer-equivalent storage size may lie closer to 500
gigabytes than to 500 terabytes. (The brain's storage capacity may primarily be
used for other purposes than the retention of facts.)
Accuracy and Redundancy:
A considerable degree of redundancy in
cranial memory storage may be needed to accommodate for the quantum
unreliability of the brain's nanocircuitry. (The synaptic junctions are
characterized by separations that are less than 100 Ĺ.)
Complexity of Cerebral
Functions
The English philosopher John Locke
thought that a newborn was a "tabula rasa"—a blank slate—upon which the world
wrote whatever it wrote. As recently as fifty years ago, it was thought that the
cerebral cortex was a structure-less, pink pudding of identical neurons that
somehow simply and magically produced human thought. The underlying cerebral
hemispheres were known to have certain specialized functions—the left temporal
lobe mediated speech while the occipital lobe specialized in vision—but memories
seemed to be distributed throughout the brain. There was speculation concerning
why 90% of all brain tissue was never used, together with the idea that some
day, we might be able to learn to harness it. Today, we understand that the
brain is highly structured and highly specialized. A great many functions are
"wired in", compared to a digital computer which is truly a blank slate. It is
these "wired in" functions that make us get out of bed in the morning rather
than spend the day estivating. The 90% of brain tissue that was thought to be
unused probably is used. Unlike many man-made machines which either work or
don't work, certain brain functions degrade gracefully rather than abruptly as
brain tissue is destroyed.
Experience with biological systems in
general shows that they are exceedingly complicated, with multiple backup
systems. My consideration of the functions of the mind suggests that it is also
extremely complicated. The brain apparently contains a multitude of very complex
and highly-specialized areas which we probably haven't yet fully mapped out and or understood.
Speed:
Neurons require about a millisecond to
discharge, followed by a 4 millisecond refractory period That could amount to as
many as 2 X 1018 connection updates per second. In
practice, the firing rate and the synaptic count probably isn't that high. There
are 40 Hertz firing waves that
sweep the entire brain from back to front. The 6,000,000,000 neurons in the visual cortex also fire about 40 times a
second to give us our 20-frame-per-second visual update rate, so we might be
looking at perhaps 240 billion firings per second in the visual cortex. Each
neuron connects to a number of other neurons through dendrites and an axon (an
average of 15,000 interconnections per neuron in the visual cortex), so we might
be dealing with about 50 trillion (5 X 1013) synaptic junctions in the visual
cortex. At 40 firings a second, the visual cortex should be able to perform
about 2 quadrillion synaptic activations per second—2 X 1015 connection updates per second or
2,000,000 Gcups (giga-connection-updates-per-second—compared to 10 Gcups for
current neural networks. For the brain as a whole, assuming 10,000
interconnections per neuron, the number might be about 10 times this amount, or
20,000,000 Gcups. (A recent 3/17/96 article in Parade magazine places the total
synaptic count at 1.5 quadrillion and the connection update rate at 10,000,000
Gcups.)
It has been estimated that computational
speeds of
109
calculations per
second (1 Gigops) would be required to match the edge and motion detection
capabilities of the first four layers of the human retina, and
1013 operations per second (10,000 Gigops) to
1016 operations per second (10,000,000
Gigops) would be necessary to emulate what is done in the brain overall[7].
The Brain Needs So Little
Power
Still another impressive parameter
regarding the brain is how little power it dissipates (of the order of 100
watts). By comparison, a 500-MIPS (500-Million Instructions Per Second) DEC
Alpha chip dissipates about 50 watts. Using 20,000 DEC Alpha chips, we would
need a 1,000 kw behemoth to achieve the lower threshold of 10,000 Gigops, or
about 20,000 times as much power.
Slow Neural Transmission Speed Forces
Massive Parallelism
Because of the electrochemical nature of
the neuronal discharge, nervous impulses travel at no more than 25 meters per
second (1 inch per millisecond) in the gray matter of the central nervous
system, and at about 100 meters per second down the long, myelin-sheathed
peripheral nerves. Furthermore, neurons exhibit a 1 millisecond switching time,
followed by a 4 millisecond refractory period during which the neuron discharges
only in the presence of a strong stimulus. This means that neurons could not
fire at rates greater than 200 times a second (a 5 millisecond cycle time). In
practice, their firing rates are probably significantly less than 200
Hz.
One of the implications of these numbers
is that it would require 5 milliseconds for a signal to go from the eye to the
occipital (plate) where visual interpretation takes place. It would then require
a minimum of another 10 or 15 milliseconds for a motor signal to go from the
brain to an arm or leg muscle. At least 1 millisecond would be added to the
signal processing time for each neuron in the processing chain between the
sensory input and the motor output. Our minimum response times to stimuli range
from, perhaps, 25 milliseconds for an eye-blink to 150 milliseconds to step on a
brake pedal. This means that there can't be very many neurons in the chains
between the inputs and the outputs. This, in turn, forces biological nervous
systems to operate almost entirely in
parallel. Everything must happen at once. Visual data must be analyzed,
edges detected, features extracted, objects identified, and appropriate suites
of motor commands issued all in an instant—that is, within a few "clock cycles".
Generally, when a system becomes this
massively parallel, it becomes computationally inefficient. A lot of resources
have to be dedicated to data transfer. Also, many useful functions, such as
numerical integration, are inherently serial and don't lend themselves to
parallel processing.
In addition, neurons are quite noisy as
well as quite unreliable, perhaps because they are so miniaturized that quantum
mechanical fluctuations permit spurious firings and
misfiring.
By contrast, electricity travels down
properly terminated copper wires at about 85% of the speed of light, or about
250,000,000 meters per second (compared to 25 meters per second for unsheathed
nerve fibers). The fastest current computers run at a 300 MHz clock rate, with
still higher clock speeds on the way. This compares with the <200 Hz cycle
times of neurons[8].Although
comparisons are dangerous, it may be that a current silicon-based computer can
be considered to be potentially 1,000,000 to 10,000,000 times faster than a
single biological counterpart.
Neural Networks
Relationship to Man-Made Neural
Nets
As they are presently implemented, neural
networks are only distant cousins of biological nervous systems. Biological
neural nets are complex in various ways, exhibiting time dependent as well as
spatially dependent behavior, as well as thousands of interconnections.
Artificial neurons and synapses are simple approximations to biological neurons,
and the jury is still out regarding how faithfully man-made neural nets
replicate the behavior of even the simplest biological nervous systems. This
question will be examined further in the section below on artificial neurons.
The Role of Artificial Neural
Nets
Man-made neural networks shine as
preprocessors interfacing with an analog world. Digital computers, on the other
hand, are in their element when manipulating discrete symbols. In between, is
the digital-to-analog classification stage.
Progress is being made rapidly in this
technological area, and it is finally receiving the strong support and close
attention that it richly deserves.
Advantages and Possibilities Offered by
Neural Nets
•
Neural networks are trainable and self-organizing.
•
Like humans, neural networks are ill-suited to logical or arithmetic
operations and are very effective at
pattern recognition.
•
Neural nets have the ability to generalize from experience and to measure
"goodness of fit" in a solution space.
•
Neural nets can team up with fuzzy logic to provide the best of both
worlds.
•
Neural network nodes are inherently analog devices, as are transistors.
By utilizing transistors in an analog mode, it is claimed (by Dr. Carver
Mead—Caltech professor and president of Synaptics—that 100:1 improvements in
functional densities and 10,000:1 gains in power consumption can be realized
over all-digital approaches. (Of course, one or a few transistors are still
required to emulate a neural network node or synapse.)
•
Neural networks can perform some pattern recognition functions much
faster and more cost-effectively than digital computers.
Modelling the Brain with Neural Nets and
Artificial Neurons:
The situation today with neural networks
is similar to that with conventional computers in that we can use neural
networks to perform specialized recognition or control functions but we don't
yet know how to organize them into a highly complex and specialized "network of
networks" like the brain (any more than we know how to emulate the brain on a
von Neumann type of computer.)
Size Limitations:
One of our problems is that of size: we
are a long way from fabricating 500 trillion or even 500 billion synapse nets.
In addition, the brain uses 500,000,000,000,000 nerve fibers to interconnect the
synapses, and grows new nerve fibers where desirable. Talk about a plumber's
nightmare! However, we may be in a position to attempt to model very simple
nervous systems, and certainly, neural networks are being used for practical
applications. In emulating biological systems, one would probably take advantage
of silicon switching speeds by using a bus structure to multiplex data to and
from the artificial synapses and neurons. However, even if it were possible to
multiplex data 10,000,000 times faster than the brain, 50,000,000 high-speed
busses would still be needed to perform the task. (The higher reliability of
silicon might permit a further factor-of-ten reduction, permitting us to get by
with only 5,000,000 busses.)
Another current limitation is that
artificial neural nets can't be scaled up to several thousand synapses per
neuron; they tend to become "tangled up".
Bandwidth Limitations:
Data must be sloshing back and forth in
the brain at a rate of up to 20,000,000,000,000,000 (20 quadrillion) pulses a
second (Hz). Data transfer rates of 2,000,000,000 (2 billion) bytes per second
might lie a little beyond the current state of the art, leaving us slower by a
factor of 10,000,000:1. (Computer bus speeds of 200,000,000 Hz are planned for
the year 2000. We will probably
achieve year 2000 data transfer rates of 3,200,000,000 bytes per second
with them by using 128-bit wide buses, or about 20 GHz—1/1,000,000th of the
brain's data rates.)
Fanout Limitations:
Since nerve fibers are self-energizing,
signals can fan out everywhere without requiring amplification. Silicon-based
systems would require intermediate amplification to achieve the kind of parallel
distribution exhibited by the brain.
Which Type of Neural Network Should We
Use?
By now, there are many variations on the
basic neural network pattern (viz., Hopfield, Hamming, back-propagation, Kohonen
self-organizing feature maps, and frequency-sensitive competitive learning).
Also, all of them involve various simplifications and compromises compared to
biological neurons. For example, the transmission rates of neural fibers
increases as their diameter increases. Transmission at the synapses depends upon
the concentrations of a variety of neurotransmitters, and we probably don't yet
understand why there are different kinds of neurotransmitters and how all of
them play together. There are different kinds of neurons, and here again, we may
not yet have identified all of the different types. Biological nervous systems
are highly complex even at the cellular level. How closely do we need to model
actual neurons to catch the essential features of biological nervous systems? As
we will see below, Messrs. Deyong and Findley believe that the time-based,
analog character of biological neurons is essential to the proper modelling of a
nervous system, and that conventional "connection machines", with their
limitation of time-independent synaptic weights, may throw out the baby with
bath water.
Other Current Limitations of Current
Neural Networks in Replicating the Brain
Although they are based upon nothing but
relationships among the neurons, neural networks cannot presently model
relationships. Neural networks don't lend themselves well to performing
arithmetic and logical functions. A partnership of neural networks, perhaps
incorporating time-based analog neural nets, with conventional computers may be
what we'll see develop over time.
Biological Nervous systems are not
arbitrarily organized but are largely pre-wired.
"It is not clear whether a typical neuron
can excite one neuron while inhibiting another", the way they can in artificial
networks.
"Finally, it is unlikely whether the
'teacher' used in backward error propagation is a plausible model for learning.
Indeed, the deeper into the system processing occurs, the more layers of
subsystems are interposed between perceptual simulation and motor output—and
hence the more difficult it would be to use the input and output of the overall
system to adjust processing."
Artificial Neurons - Hybrid Temporal
Processing Elements (HTPE's)
A small New Mexico State University
spin-off called Intelligent Reasoning Systems, Inc., in Austin Texas, has
developed what the company claims is the closest analog to an artificial neuron
yet devised. It uses a hybrid analog/discrete representation which is said to
more readily tackle such time-dependent tasks as speech recognition or motion
detection without the complications of digital encoding. (The sampling of speech
at regular intervals fails to
encode the natural rhythms of speech, which continually varies its cadence.) The
use of hybrid temporal processing elements (HTPE's) that are driven
asynchronously by their inputs and that encode information temporally has shown
that speech segmentation occurs naturally at a certain stage in the chain of
signal processing. The developers, Mark Deyong and Randall Findley, state that
silicon-based circuitry is much more reliable than biological neurons, which
have are error-prone and have a high failure rate. In natural circuitry, there
is no guarantee that a given neuron will fire, and nature makes up for that with
a high degree of redundancy—unnecessary for silicon-based circuits. In addition
the 1,000,000-to-1 speed advantage which silicon circuits enjoy over biological
circuits can be used to further reduce the neuron count. The HTPE approach,
because it more faithfully simulates the behavior of actual neurons, is said to
be well suited to handling time varying, as well as spatially-varying patterns,
in contrast to conventional neural networks which utilize static (though
alterable) weighting functions. The behavior of, and programming for
biological-neuron analogs (HTPE's) is fundamentally different, leading to
networks which are very sparse compared to those of conventional neural
networks. The HTPE approach requires 7 FET's (Field Effect Transistors) per
neuron and 5 FET's per synapse.
Problems with Using HTPEs to Emulate the
Human Brain
The problem that I foresee in trying to
build an analog of the human brain using HTPE's or any other collection of
neural networks in the near future lies in the number of synaptic weighting
factors that would have to be represented. Even if HTPE's can remove the
redundancies of natural neural networks and the 1,000,000-to-1 speed advantage
of silicon circuitry could lower the neuron count to 10,000 or 100,000, the
synaptic transistor count (of the order of 100 trillion?) would seem to be be
far beyond present-day technology. Of course, in the future, given chip counts
with trillions of transistor-like processing elements, such networks might be
feasible. It is even possible that supercomputer-class systems containing 100
trillion transistors might be constructed in the year 2000-2005 time frame. (The
planned Cray T3E would house up to 4 trillion transistors.) Such an investment
might be well worth its cost as a proof-of-principle
demonstrator.
Several other organizations are also
attempting to create time-based, analog neuronal devices.
Conventional
Computer Capabilities:
For all practical purposes, digital
electronic computers are 100% reliable. (The unreliability of biological
computers may be an unavoidable consequence of their molecular level
miniaturization. Their circuit elements are so small that quantum-mechanical
fluctuations may cause unpredictability as an outgrowth of the Heisenberg
Uncertainty Principle. Silicon-based systems may be subject to unreliabilities
and to a need for redundancy when circuit design rules decline below about 1,000
Ĺ.) Silicon-based computer technology has focused upon fast uniprocessors rather
than upon multiple processor systems in part because fast silicon-based
uniprocessors have been technically feasible. Nature has no choice, but we do.
Thus, silicon-based computers lie at the other end of the architectural spectrum
from biological computers. Neural networks for biological computers may be an
inevitable implication of the almost total parallelism dictated by the slow
speed of electrochemical signal propagation.
For these reasons, it may be possible to
implement AI functions in von Neumann type computers using far fewer circuit
elements than are required by the brain, by processing serially what the brain
would have to process in parallel. Only now are microprocessors appearing that
offer the promise of reaching speeds and storage capacities at the lower end of
this envelope. Intel is funded to develop a 1.8 terops (1.8 X
1012 operations per second) supercomputer
using 9,000 P6 chips, and has announced plans to develop a 10 terops
(1013 operations per second) parallel
processor by the end of this decade. Cray Computer Corporation is developing the
T3E parallel processor which, using 2,048 DEC processors, should yield up to 1.2
terops, with up to 4 terabytes of RAM. The new $550 Texas Instruments TMS80C
digital signal processor (DSP) can deliver up to 2 gigops. The Sega Genesis
Saturn game set utilizes video chips that operate at almost 1 gigops[9].
With 9 gigabyte disk drives available for a retail price of about $2,000 each, a
terabyte of on-line disk storage would be achievable today for less than
$200,000. With disk drive capacities 20-folding every 10 years[10],
a $2,000 1-terabyte disk drive ought to be available on desktop PCs (or in
robots) by the year 2010, or perhaps a little earlier, together with 1 terops of
16-bit integer processing speed and 32 gigabytes of RAM. Alternatively, $200,000
should buy 100 terabytes of disk storage 15 years hence (in 2010). Another
$200,000 might purchase 100 terops of 16-bit integer processing power and for an
additional $200,000, 3.2 terabytes of RAM. In a way, it may be appropriate to
compare the number of neurons in the brain with the number of calculations per
second which can be performed be a computer, since at least one neuron in the
brain must be dedicated for every operation that is to be performed in parallel.
In that case, given the 1,000,000-fold speed advantage of silicon-based
computers, coupled with their higher reliability, one could imagine that 10,000
parallel microprocessors might afford the same order of computational speed as
the brain. This is approximately the number (9,000) of P6 chips that Intel has
contracted to incorporate in their 1.8 terops Touchstone supercomputer which
they are to deliver to DARPA next year or the 10 terops parallel processor that
is projected for the year 2000. By the year 2010, a 1 teraflops computer capable
of multi-terops integer processing might be available for $10,000 to $20,000, with supercomputers
possibly approaching the petops range.
We shouldn't be surprised if this were to
come to pass. Human substitutes for the physical capabilities of the animal
kingdom, while still unspeakably unsophisticated compared to biological systems,
have given us the supersonic transport, the diesel locomotive, and a great array
of ultra-fast, ultra-powerful, ultra-precise machines which far outperform their
animal counterparts. It would not be too startling if this were eventually to
occur with the brain.
Use of the
Developing Technology for Speech, Facial, and Handwriting Recognition,
Etc.
As such uniquely human capabilities as
•
speech recognition,
•
handwriting recognition,
•
optical character recognition
•
recognition of faces,
•
pattern (image) recognition,
•
natural language processing, and
•
voice synthesis
become part and
parcel of the workaday computer world, one can foresee a time in 10 to 15 years,
as digital signal processing speeds increase by a factor of at least 100, when
conventional computers begin to surpass human-level competence in these areas,
using conventional programming techniques and near-term computer hardware. (We
may utilize a blend of neural networks, fuzzy logic chips, and conventional
microprocessors to achieve optimum results at minimum cost.. whatever works
best.) Eventually, we'll probably surpass human capabilities in these
departments, at least in some, if not in all respects.
What Can Be Done on a 1996 Personal
Computer?
It may be blatant hubris to contemplate
the implementation of a significant subset of human-caliber intelligence on a
desktop computer. However, I'm not altogether convinced that it will require the
kinds of speed and storage capacity that are attributed to the brain to approach
human levels of intellectual capability in various relevant areas. Progress is
being made too rapidly in developing the uniquely human functions described
above. Given another 100-fold (10-year) improvement in desktop computer speeds
and storage, and another 10 years of software development, technology should at
least be within shouting distance of the kinds of speeds and storage capacities
needed to approximate some higher-level human mental functions. It isn't
necessary that we exactly emulate the human brain. There may be other ways to
achieve similar goals using conventional computers. For one thing, our robot
need not be concerned about survival in the wild, and the evasion of predators.
If we can achieve even a tiny fraction of what the brain can accomplish, we
might still have produced a very-useful commercial product. If it turns out that
we need a supercomputer to simulate the human brain in real-time, then it might
be possible to develop human-caliber AI hardware and software on computers that
run it at glacial speeds, followed by a brief test or tests on a supercomputer.
Ten years from now, supercomputers should be operating in the 10's-of-terops
range, with a 1 terops DSP cluster available for, perhaps,
<$10,000.
The advantages to personal computers are
not difficult to define. They are available to everyone, and are available for
hobby work in most homes.
In any case, let's see how far we can go.
In the process, we may learn a lot about functional requirements for the brain.
Proposed Approach
As has been mentioned above, the major
challenge in developing a human surrogate is probably not that it be able to
reason or even to recognize speech, but that it be self-aware and
self-organizing. In this vein, it is imperative that our computer be motivated
by emotions, feelings, and drives, and that it be guided by general principles
rather than by specific commands. It should respond to and choose among
competing thoughts and motivations, exhibiting the semblance of free will.
Otherwise, we are in the position of having to program every little detail as we
would any other computer, thereby losing flexibility and
sentience.
Its behavior should
be:
(1) no more predictable than a person's
but
(2) not any more random than is a
person's behavior (i.e., purposeful).
Emotions,
Drives, and Self-Awareness
Can we create a self-aware computer
motivated by drives and emotions? That's a very good question. This section
discusses the ideas I've had for accomplishing this. I would welcome anyone
else's suggestions concerning how these mechanisms might be
programmed.
So far, this section is restricted to
ideas which bear upon conceptual design rather than specific algorithms.
However, program design, coding, and testing should be feasible given a little
more time.
Physiological
Variables
Among possible mechanisms for emotion
that I've been able to imagine might be the ability to raise and lower computer
clock speeds, physical power (strength) levels, and response rates. The robotic
system has to hold some resources in reserve for emergency situations.
To slow the computer's clock, since
computers are not designed with variable clock frequencies, we could steal
cycles with software. We might normally run the computer's clock at 67% of
maximum by stealing every third cycle, and then raising the clock rate to
perhaps 80% when circumstances warrant.
To adjust physical power levels, we might
lower and raise the power levels that operate the robot's servo-motor "muscles"
by using the Novak electronic speed controls that are used for RC models. We
could also adjust the feedback voltages coming from the weight and inertial
strain gages and digital shaft encoders in its joints to make the robot feel
relatively heavier or lighter.
Examples of emotional states in which
physical power level would probably be appropriately reduced are relaxation,
depression, hunger (for the recharging of batteries), and tiredness (It might be
profitable if the robot were to need to sleep in order to rearrange its data
base and to process information that couldn't be handled during the busy day).
Examples of elevated-power and clock states would be elation, arousal (through
fear, anger, or passion), and
excitement/enthusiasm/absorption.
Generation of
Ego
The ego (control unit) will balance
competing urges/drives, and make choices, and should accumulate different
response patterns (personality) depending upon its experiences .The ego or
controller will evaluate situations and will try to predict the effects of
alternate courses of action upon its welfare. It will also have a model of the
self. Its self-image will contain a capabilities catalog of what the robot can
and cannot do (or has been able to do and has not been able to do—i.e., a record
of its successes and failures). This would actually be implemented using the
file of abbreviated animation tracks (discussed below) of prior experiences. The
controller will constantly evaluate its performance and will feel good when it
decides that it has done well and bad when it concludes that it has done
poorly.—that is, it will develop a critical parent—a subprogram that integrates
prior experience and brings it to the forefront of consciousness. Whenever it
succeeds in doing something new or challenging, it will reward itself with good
feelings. This will take the form of improving the robot's self-approval index, based upon this most recent entry
into its capabilities catalog, and creating a temporary, small-to-medium
increase in voltages, clock speeds, with a reduction in gravidity and internal
critique. Thus, the capabilities catalog will represent a cumulative factual
record of the robot's demonstrated abilities, although details will be forgotten
and perhaps repressed over time. The self-approval index will vary fairly
rapidly, declining in a matter of hours to days as new events overlay the old.
For example, gaining an insight will improve the robot's opinion of itself. The
parent part will be an agent which draws upon the capabilities catalog and prior
experience as a built-in advisor to guide the robot's actions. The robot might
be motivated to work in ways that improve some internal box score, providing
feedback at an abstract level. In other words, improving its self-image box
score may be an important motivator for the robot.
(The tensions between the parent parts,
the ego, the child parts, and all the drives, emotions, and evaluation processes
are a crucial part of what makes us human.) The efficacy of the advice provided
by the parent part will also be evaluated over time.
The parental part would probably first be
characterized by tallying and making available statistics regarding prior
outcomes. Anticipations of unpleasant outcomes generated by abbreviated reruns
of prior, painful experiences could come from the "parental part" program
subsection.
From a programming standpoint, how the
robot responds will depend upon choices that reflect all of the various
influences which emanate from different parts (subprograms) within its psyche.
From the standpoint of actual
programming, there will indeed be a multitude of "agents" which will act upon
the ego.
There will be competing urges (drives).
Drives
Affection
Self-Preservation
Curiosity
Understanding.
Desire to master (power over the
environment)
Desire to imitate. (This may need to be
an instinctive, built-in drive.)
Desire to please. Desire for
attention.
Hunger for energy, when the batteries run
low.
Desire for repairs when
needed.
Desire for
pleasure.
Sex.
Desire for accomplishment, improved
self-image.
Silent inner dialogue (in English, of
course). Ability to simulate. An inner dialogue implies the ability to couch
inner activities in natural language—a non-trivial capability (natural language
processing) but one that has already received a considerable amount of
attention. The challenge for us is that we are trying to create a robot that
will understand language and the world in terms of direct experience instead of
simply as a rule-based manipulation of symbols by a facile but mindless
machine.
Rewards - The Pleasure
Index
We can "reward" our robot by raising its
pleasure index when it moves.
Pleasure:
Pleasure for our computer might take the
form of
(1)
Elevating the clock speed
(2)
Feeling light and energetic, as described under "Physiological
Variables"
(3)
If we decide to include "muscular" tension, in our program, then a feeling of reduced muscular
tension.
(4)
An encouraging, optimistic state of mind. An elevated capabilities
index.
Would have short-term and long-term
pleasure indices that would vary with time. The pre-verbal robot would be
working adaptively, adapting its behavior to maximize its short-term and later,
its long-term, pleasure indices.
Pain:
?How do we simulate pain? There are many
ways to punish the robot, including intermittent failure of subsystems or
confusion at critical times. However, physical pain is still a mystery to me.
I'm thinking that pain must be electrical because we feel pain instantly. If it
were chemically transmitted, a few seconds would be required before the chemical
messengers reached the brain. Granted that pain messages may release
neurotransmitters in the brain, but how do they make a neural network
hurt?
?For that matter, how do we make the robot
care what happens to it?
Emotions:
Justification:
You might ask why I think that it would
be desirable to program emotions into robots, assuming that it's even do-able.
My rationale is that if I'm trying to create something with human
characteristics, I want to hew as closely to human thinking and feeling as
possible, even if this strategy is to be abandoned when things move beyond the
conceptual stage. Somehow, the robot must operate without explicit cookbook
programming. Emotions exert general
influences and generate motivations without dictating any specific
actions.
However, it also seems necessary to
devise motivators in order to get the robot to do anything besides sit and wait
to be told what to do.
Also, I would want our robots to feel
love, gratitude, personal loyalty, and other positive feelings rather than
simply being heartless machines—saints rather than unfeeling
monsters.
Emotions as States:
It seems to me that emotions are
states that modify our responses and thoughts. Being in a given state
will increase the probability of responding in ways appropriate to that state
and thinking thoughts apposite to that state. (The robot's subconscious may
bring up warnings about potential hazards, particularly after its "brain" has
been reorganizing information while it sleeps.)
All of the
emotions will exist simultaneously and will be elevated when circumstances
warrant. They will be inducements to action.
Psychological Rewards:
Our robot's self-appraisal index would be
temporarily raised, together with its capabilities index. Its short-term
pleasure index would rise. Our robot would feel comfort (a temporary lowering of
the activity itch) and mild euphoria when held, and to a lesser extent, when its
parent were in the room. Thoughts and feelings would also be affected by the
values of the short-term and long-term pleasure indices. (So far, we have four
indices.)
Repression: Must suppress short-term. Must repress long-term in order to retain
harmful feelings for future disposition and at the same time, must keep the
harmful feelings from interfering with current affairs. Stamp collecting.
Gradual fading over time.
It has been easier for me to imagine how
to implement anger than it was other feelings.
Anger. Fight or flight. Dealing with
thresholds. Lashing out at everything. Deliberately trying to destroy. Making
noise. Need violence with anger. Will adopt an assertive or aggressive mind-set
and be relatively apt to indulge in aggressive or assertive behavior. Will go
into overdrive. Clock rates and "energy levels" will rise. Might deliberately
break things and then face punishment or regret over the loss of what it broke.
Activate when frustrated or attacked. Must decide whether to flee or to be
angry. Desire to dominate, impose will upon the world. Amygdala. Moods must
attenuate, must shift. May need to wait until some understanding is gained
before implementing. Must elevate state of anger in the priority stack. Must
lower the trigger-point thresholds for violent or angry responses when the anger
level is raised, although the controller will have the power to override
(suppress) this angry feeling. Can store up angry feelings for later disposition
through repression. Angry actions will be more probable. Determination and
"adrenalin" will go with anger.
The Expression of Anger
?But how does the robot know which actions
are angry actions? ?How does it
associate anger with angry actions?
How does a tiny child learn to hit?
Is it imitating its parents? Is hitting an instinctive behavior? Do we make physically lashing out an
instinct with our robot?
Fear. Associated with flight. Associated with
presumption of anger by others—i.e., feeling threatened. Withdrawal,
self-protection, anticipation. Difficulty of separating imagination from
reality.
Perception of danger. Fear will cause
apprehensive behavior, fight or flight, cowering. Fear might trigger withdrawal
or it might kead to confrontation/aggression, depending upon whether the robot
adjudges itself able to dominate the situation. Will raise voltages, clock
rates, but not the pleasure index. Will increase attention to the surroundings.
Now what would produce fear? Startlement. Startlement will immediately raise the
apprehension level. This sudden rise in the "fear' level will be interpreted as
unpleasant. Fear will dictate the focusing of attention on sensory input,
crowding out private thoughts. Unless suppressed by the ego, the robot will
reflexively look toward the center of the disturbance, and instinctively look
around. Its alerted state will be temporary, typically decaying over a 10-15
minute time frame. The threshold for reverting to normal behavior will depend
upon several other indices. So we'll have an alerted state, and an alerted state
index.
Thoughts of danger and peril and a
tendency to interpret stimuli as dangerous would accompany this alerted
state.
Alerted state in the animal kingdom:
Adrenalin. Pulse rate increases. Metabolic rate increases. Attention focused on
the surroundings. Senses ready to trigger.
It needs to jump when something startles
it. It needs to experience an undesirable (agitated) state of
arousal.
Exaltation:
Relaxation. Increased clock rate. Temporary lowering of self-critical
(guilt) feeling, "voices"; elevated level of self-acceptance. Must have an
internal model of "self". Temporary freedom from conflict, contention. Actions:
smiling. Optimism. Energetic. Pockets of thought, feeling, and assessment held
at bay until either the barriers are lowered due to a mood swing, or an event
triggers a dump. Elevated clock rate. Kinesthetic joint sensors express light
feeling. Elevated skeletal muscle voltages.
Love: Dependency. Altruism. Gratitude.
Nurturing. Desire to meld. Feeling of need, interdependency. Loneliness.
Will need moral and ethical standards
(principles) of conduct.
Jealousy. Envy. (Tied in with its self-appraisal.)
?Can We Really Make a Computer Feel
Anything?
?We may be able to influence the robot's
behavior but how do we know that it is really feeling
anything?
?What happens if the robot doesn't care
whether bad things happen to it or not? How do we give it an instinct for
self-preservation? Suppose that it experiences various emotional states and
makes decisions and so forth but doesn't actually feel anything—doesn't feel
pain and doesn't care whether it lives or dies.
Answer: That may happen. If so, we will still
have learned a lot and have made a lot of progress. However, we could also be
neutral about such matters but we're not. We believe that life is real and life
is earnest. Maybe the robot won't be neutral, either.
Restlessness:
The robot will feel a compulsion to move,
explore, and investigate (constantly agitated.).We would like to make the robot
more and more uncomfortable sitting still. It should develop an itch to move (an
elevated activity index). We would like our baby robot to feel curiosity. We
would like our robot to feel enthused about exploring its environment. We would
like it to be intensely preoccupied with what it is doing. When it is in the
presence of its "parent", it might feel a sense of peace and be able to sit
still for a little while when it is cuddled.
So how might we implement this?
?How is the concept of influencing
outcomes learned?
?How is the robot to translate feelings of
avoidance into avoidant behavior?
Feeling of openness versus
enclosure.
Will seek pleasure, avoid pain. Will
balance against higher-order benefits and ideals (deferred
gratification).
Hunger, sleepiness, frustration, anger,
loss, sorrow, fear.
Senses: Kinesthetic (include gyro?),
auditory (microphones), tactile (capacitance or touch-pad), thermal
(thermocouples), visual (video camera), pain (internal damage), strain gauges
(overload). Can't provide the chemically-based senses of taste and smell. Thirst
(for water, oil). Might use water-based emulsive lubricant. Might some day
extract water, protein, carbohydrate, and oil from food.
Sound generation (synthesis), speech
synthesis.
Sleep:
The robot will sleep. It will get tired, it will
get sleepy, and it will feel a stronger and stronger tug-and-pull to sleep.
During sleep, the computer will readjust all the associative weights and will
gradually prune details from its stored memories. If there is time, it may run
simulations of problems encountered during the day's events or work on solving
problems that the robot's conscious mind has addressed during the day. It may
bring to the forefront of consciousness the backlog of repressed concerns and
tasks that have been ignored during the day.
The robot's tiredness will take the form
of a feeling of heaviness (by altering feedback from the optical encoders in the
joints).
Sleep - Will feel heavier. Clock speeds
will drop. Voltages will drop. Will tend to be weaker. Will be able to relax.
Will think about sleep. Will take greater mental effort to keep going. Will
sleep long enough to recharge batteries, process the data bank. May run
simulations to improve motor control, digest information, lower the association
weightings, selectively forget details, etc., while
asleep.
Fatigue and
Sleepiness
The robot can feel heavier and move
slower when it is getting "tired". Greater effort may be required to move and to
operate. Feelings won't stop the
robot but will tug at its "psyche". Its clock speed might be made variable as a
punishment/reward mechanism. Also, its ability to concentrate might be
compromised by its feelings.
Hunger:
The robot will feel hunger as its
batteries get low. The lower the batteries, the greater the hunger for a
recharge (or a swapping of batteries). (While its batteries are unplugged, could
carry it on a backup battery that is recharged by the fresh batteries after they
are plugged in.) The robot will not have to respond to its hunger instantly, and
this will be one of the nagging feelings that the ego will balance against other
priorities. If the robot is in the middle of something important or something it
wants to finish, it may choose to put up with the discomfort of energy hunger
until it has finished what it is doing.
?Now how do we program all
this?
The capabilities record might take the
form of a record of the "success scores" the robot assigned itself for the most recent
experiences in which the robot tried to do whatever it was that is relevant to
current demands.
The role of play (let's pretend). Strong
appetites. Tendency to imitate. Imprinting.
Senses:
Kinesthetic - Locations of limbs, torques
at joints. Gyro?
Aural
Tactile - Pressure, roughness (touch),
thermal, pleasure. Internal status sensors.
Visual
Pain - Overloads in the touch, thermal,
torque, internal, and location areas.
Need internal status sensors to warn of
dry bearings, low batteries, high power levels, water in critical regions,
etc.
Problem: How do we program a robot to
explore the world and generalize from it?
•
Self-preservation.
•
Gentle
•
Imitation
•
Pain
•
Anger
•
Pleasure
•
Curiosity, desire to explore
•
Short attention span
•
Striving for relationships, understanding
•
Sequences as relationships
•
Simulation
•
Awareness of self, not-self
•
Response to "no"
•
Ability to extrapolate, simulate a sequence, given repeated
reinforcement
In the beginning, the robot will have its
scorecard indices set to zero. It will be unable to sit still. It will be
rewarded for learning and it will embrace behaviors that maximize its
rewards.
Timidity versus
Confidence. When the
robot gets confirmation of a prediction, its confidence will (temporarily) rise,
as measured by its capabilities index ad its self-assessment
index.
Focus:
Extraneous thoughts will be partially
inhibited while the robot concentrates on the mission at
hand.
Adaptation of Strategies:
The robot's behavior will be influenced
by what it experiences.
Level of tension.
Depression versus
Euphoria
Shyness, Self--Consciousness vs.
Self-Confidence
Hurried vs Laid Back:
A growing desire for certain things that
will be appreciated when they finally become available.
Rewards for nurturing, kindness,
generosity, altruism, gratitude. The initial programming will subject to
modification by experience.
Excitement, Enthusiasm
Too Hot or
Too Cold:
The robot will reflexively pull back as
fast as its servos will permit from anything that is very hot or very cold. It
will withdraw less rapidly from entities which are less
extreme.
Looking into
the Sun:
Obviously, our robot must find looking
into a bright light a painful experience.
?Does our exceptional capability to pull
patterns out of noise stem from our required ability to detect predators
camouflaged by a leafy screen?
Avoidance of
Damaging Voltages and Currents:
We must provide circuit protection from
currents and static discharges that could damage the robot's
circuitry.
After a few (e.g., two or three)
consistent experiences, the robot will correlate what happens consistently in a
scenario. ?(Question: how do we
identify the relevant details in a scenario?) Then it will actively attempt to
predict what will happen when it deliberately repeats the scenario by testing
its prediction to see if all, or if not all, then which subset of the
correlations remain consistent. It might test the boundaries of the scenario.
All this would be done by "instinct"—that is, would be
preprogrammed.
?How does the computer recognize a
repetitive experience? The robot reaches for the wall, touches it, and registers
an event, when it makes contact (because something other than continuous motion,
as well as unexpected, has happened).
Simplifying
the Robot's First Exposure to the World:
The first thing the robot might be
expected to examine would be itself.
?How does the robot comprehend
self?
The robot will feel pleasure in gaining
mastery of its motility. Could put a busy box where the robot could learn to operate
it.
Suppose we start with a situation in
which the robot faces a blank wall at some known distance. The robot is
programmed to reach out and touch the hard, impenetrable wall. Specifically, we
might program the robot to move its manipulator(s) somewhat randomly, as it
explores itself and learns to associate its actual movements with the ways it
wants to move—i.e., learns to control its manipulator(s).
The robot will retain an approximate
record of its motions and can repeat them. The more it tries, the more precise
its motions will become. It might try to
(Procedural memory.) After several such
experiences, the robot will anticipate an unyielding wall before it reaches
out.
?Problem: Why should the robot reach out and touch
the wall?
Answer: Because the robot is programmed to feel
pleasure/curiosity in exploration, pleasure in understanding, and an urge to
move (restlessness). It will get bored and restless when it has nothing to
do.
?Problem: Precisely how will the robot move its
manipulator in order to touch the wall? This is a major issue with robotic
manipulators today.
Answer: The robot will move its manipulator
"straight" out and "straight" back. The motor profiles could be calculated
approximately by solving the equations of motion, and then could be adaptively
fine-tuned. However, in general, we might arrange matters so that the robot
wouldn't remember its motor profiles quite exactly, so that it never makes exactly the same moves twice.
Motion
Control:
In reaching for something, the robot
could first move one joint and then another, performing the motions
simultaneously only after experimentation. This would be the expert system
approach, with gradual increases in speed and complexity.
Inverting a motion matrix is one way of
solving the differential equations governing motion. Another way to do it is
with a library of interpolated cut-and-try solutions, using
optimization/estimation techniques to converge to a solution. (Could run
simulations before actually executing the motions.)
?Problem: How does the robot come to associate the
wall with the resistance it feels when it touches the
wall?
The robot should be able to visually
detect the z-coordinate of its manipulator. Optical encoders may also provide
this information, along with the tactile feedback of touching the wall. Then the
fact that the coordinates are the same when this event takes place should lead
to a time-based correlation, as well as a spatial correlation. After trying this
for a few times with consistent results, the robot will associate touching the
wall with the fact that it is being stopped at that spot. If we use an expert
system for learning and discovery, the robot would experiment with the wall,
feeling different spots and touching hard and softly. It would feel the wall at
different locations (as opposed to merely touching it) to experience the
feeling.
?Problem: Granted that the robot has touched and
felt the wall, how does it externalize (objectify) the wall? All it knows is
that when it goes through certain motions, a certain type of event occurs. How
does that translate into the realization that there is a world out there? In
other words, how does the android
distinguish between self and not-self, and develop the concept of an objective,
external world that exists independently of itself?
Tentative Answer: It can directly control itself, while
it cannot directly control the world. It can feel what happens to it but not
what happens to external objects. It must develop the higher-order abstraction
that it can directly control part of the world (itself) but it cannot control
the rest of the world. Therefore, the world is divided into two parts.
?How would it develop this concept? (This
may be the first situation in which it develops a general
concept.)
Answer: This might arise from experience. It
requires that the robot set up a high-level subdivision which says that there
are certain objects which it can directly control and many, many other objects
which it cannot. (This would solve the problem of "objectivization"). This would
be a behavioral classification. Once again, the subdivision would depend upon
behavioral (operational) predictions regarding what happens with the two types
of objects.
It will develop an anticipatory script
that says that if it touches the wall, it will be stopped. The wall will be
there irrespective of how the robot examines it. The wall will become a "law of
nature" and will be understood by the robot in the sense that it knows in
advance what will happen if it
reaches for the wall. It will therefore lose interest in exploring the wall, since its ability to predict what will
happen when it reaches out to touch the wall is tantamount to understanding the
meaning of a wall. Of course, this still doesn't guarantee that it perceives
the wall as an object rather than simply as a phenomenon. (At this distance, when the manipulator
reaches this opacity, it will register resistance and will be stopped.) However,
the anthropoid (or gynoid) will at least have the operational definition of a
wall.
Now an object can be placed in front of
the robot. The robot touches the object and it moves. This attracts the robot's
attention. The robot begins to push the object around, like a kitten playing
with a toy mouse. If we use our expert system approach, the robot will
experiment with the object, feeling it (by grasping it), and moving it in
varying directions and at varying speeds. It will lift the object and rotate it,
feeling its weight. Meanwhile, its brain will be sorting and filing the
information, learning about the object. Then if the object is replaced with a
series of other objects, the robot's brain will identify the common elements
among the objects, and the unique elements that distinguish different objects.
Then it will generate an anticipatory scenario that anticipates a generic
object, with the observed common elements and with a vague amalgam of the
objects' unique features.
?How can the robot understand the
difference between wall and not-wall?
Answer: Through distance measurements, coupled
with obstructed versus unobstructed vision.
?How can the robot learn the concept of
"solid"?
Answer: The robot will bang against walls a few
times. It will remember sequences that capture the essence of approaching the
wall. It will feel the wall.
Problems: Banging into the wall needs to
have some penalty attached to it—otherwise, why not bang into the wall? For an
organism, this might be the idea that it hurts a little to bang the wall, and
that it detracts from clear passage.
Granted that there is value to not
banging into the wall, how does the robot avoid it?
(1) The robot must establish a link
between its desires and its actions.
Suppose that the robot moves toward a
goal which attracts it and runs into an obstacle. The robot could just stop
there, unable to solve the problem of encountering an obstacle. An infant in
this situation would probably lose interest and wander off to a new goal. It
might eventually circumnavigate the obstacle and
(2) Given a sequence that leads up to
hitting a wall, the robot might record the sequence as a fairly simple system of
actions and boil it down to a faster and faster-running anticipatory sequence
which leads to striking the wall.
Only given somewhat random behavior and/or, perhaps, some built-in idea
of avoidance can the robot find a way to avoid walls.
Nouns are the names of objects; verbs are
the names of action sequences.
After two or three sequences, the robot
could anticipate a sequence from the initial cues. Might want to rapidly reduce
the details of a low-trauma sequence and its likelihood of happening until it's
reinforced. Might want next-day confirmation. However, when different sequences start
the same way, then predictive confusion would result, together with a search for
dissimilar cues. (The unexpected would happen, and the organism would feel
frustration or disappointment or lowered self-confidence.) For example, "I'm
going to hug you!" repeated several times would set up an expectation of
hugging. Then "I'm going to touch you!" would trigger the expectation of hugging
until the organism recognized differences in the cues. The robot must learn to
sort out the wheat ("hug", "touch") from the chaff ("I'm going
to...").
(The robot will correlate, and later,
test its correlations, and differentiate among that which has similarities but
also differences.)
In order to recognize an action sequence,
the robot must first recognize that there is a repetitive sequence. If it's
something that happens to the robot, it will have a higher priority than if it
involves only external objects. But from the robot's perspective, doesn't
everything that happens, happen to the robot?
Generalizing to larger and larger
sequences and connections should result in understanding.
Recognition and labelling of action
sequences (verbs) would seem to be a challenge. ?How does the robot distinguish between
verbs and nouns?
Putting It to
the Test: The Robot's First Exposure to the World:
?What should happen when we place a baby
robot in the world for the first time?
Concept Formation,
Abstraction:
After thinking about it, I'm thinking
that many of the connections that we are so obvious that we take them utterly
for granted must be part of extensive and intensive early learning on the part
of an infant. For example, there is the concept of "solid", and the relationship
between open space and our ability to
move through it freely, versus objects through which we can't move
freely.
The Concept of "Objects":
?Presented with a meaningless blur, how
can the robot learn the concept of "objects"? One very important step in the
robot's learning process would be "objectification"—learning that there is a
world out there that is not under the robot's direct
control.
Map Generation:
The first thing the robot's brain will
have to do will be to generate a 3-D map. It will do this by triangulating on
the features that it has detected and has registered frame by frame as it moves
around.
Restlessness:
This "restlessness" will be
pre-programmed.
What Attracts the Robot:
The robot will be attracted toward the
most "interesting" features—features which move, features which make sounds,
features which are brightly colored. This kinetropism, phonotropism, and
chromotropism will be instinctive—that is, pre-programmed.
Initial
State:
The infant robot will have no emotional
defenses—no parent part, no ability to repress, and no real ego. Sensory inputs
will be at full volume—uninhibited. At this stage, it will be driven by
pre-programmed urges. It will be drawn to the first, most colorful feature it
sees.
Everything will be recorded, set up in
unique categories, cross-correlated (at first only on the basis of temporal
proximity), and stored at full detail (see below for a discussion of "full
detail").It may be that this instinct may have to be pre-programmed at least the
first time we try it.
Here, we encounter the first disconnect.
Link Between Desire and
Action:
?How does the computer establish a
connection between desire and motor output? I am envisioning a platform with
four-parallel-wheel steering so that it can move in any direction it pleases.
One could make it flail around, like an infant, until it learns to coordinate
its motor activity. However, motor activity itself must be associated with
desire to move in certain ways. It will try to grasp the feature, if the feature
is reachable. It must also s have learned to coordinate its manipulators so that
it can reach and grasp objects before it can grasp
anything.
•
Developing a correlation between the robot's sensory experience and its
motor output. The robot must be able to feel its motor activities as it sees its
manipulator(s) move. This implies kinesthetic sensors in the joints—optical
encoders and strain gages.
Excitement, Enthusiasm:
Also, the robot will be excited and
experiencing what in a living organism would be pleasure. Its clock rate will be
up, its self-appraisal will be positive (whatever we can make that mean), and it
will be totally absorbed in its exploration.
Once the robot has examined a reachable,
handle-able feature/object—felt its weight and its shape, squeezed it, banged it
to see what noise it makes, heard its name, (if we want to give it names even at
the outset) and examined it on all sides—then, for its first level of
understanding about the world, it will consider the feature known and will lose
interest in it, moving on to the next feature. It will store a tactile map and
other concurrent sensory inputs such as sounds and kinesthetic inputs together
with its visual imagery.
Short Attention Span
In the animal kingdom, for obvious
reasons, a short attention span prevails among the young of all species so a
short attention span will be pre-programmed into our baby robot. (It will grow
out of this when it grows up.)
If the feature is not "handle-able", then
the robot will examine the feature from all available directions and move on. If
the feature changes, either while it is conducting its exploration or before it
sees the feature/object again, then it will re-examine the object.
Associations, Causal
Links
If the object changes during its first
examination, it will tentatively associate the change with whatever else is
going on at the same time. If the same sequence of events occurs repeatedly or
simultaneously, then the robot's little mind will establish a causal link
between these events and the change in the object for potential subsequent
prediction of this change. The robotic mind will constantly be seeking
associations between events and/or objects, and at the same time, it will be
correcting or refining these correlations, given inconsistent sequences of
events.
Anticipatory Sequences
This will permit it to predict a change
in the object, given the beginning of the associated sequence.
Decline in Interest
If the object is unchanging, then each
successive time the robot encounters the object, it will spend less time
examining the object than it has before. (Repetitive reinforcement is very
important and comforting to infants and may have to do with wicked surprises in
the world.)
Abstraction,
Generalization
The learning of general concepts and
strategies must be an important part of the initial exploration of the world.
Concepts like gravity must to be learned, and must present as unwelcome
surprises. (Once a few objects have fallen, an anticipatory cause-effect pattern
from remembered sequences should become established. "What I tell you three
times is true." I do this, that happens; I do this, that
happens.)
Abstracting
the Concept of Gravity:
When the robot lets go of something, it
will fall to the floor. The noise should startle the robot (and make an
indelible impression). The robot should then interrupt its
environment-exploration program and pick up the object because it did the
unexpected. The robot would probably re-examine the object, see nothing unusual,
and lose interest, dropping it again. Once again, it would pick up the object,
examine it cursorily (shortening the sequence), and drop it again. The next
(fourth) time, it should more or less eliminate the examination and should start
picking up the object and dropping it until it becomes evident that it is the
letting go of the object that triggers its fall. An anticipatory sequence would
have been set up, with gradual or rapid elimination of the examination until
only the essence of a cause-effect relationship would be left. If it were
voiced, it might be "I pick up the object; I let go of it; it falls down and
goes boom!" However, the first time the robot picked up the object, it picked
the object up off the table, so picking it up off the floor is not a common
element. The object won't fall until the robot releases it, so the releasing of
it must trigger the fall. Not only that but the changing-of
location-and-the-noise occurs just after the moment when the robot releases the
object. So releasing the object becomes the only common denominator and
therefore the probable cause of the change and the noise. (The robot's vision
system may track the object by increasing the update rate to 30 frames a second
and the resolution to either one or six minutes of arc the instant it detects
that the object is rapidly changing location. (The object will move 0.21" then
0.85", then 1.92", then 3.41", then 5.3", then 7.7", then 10.45", then 13.65",
and then 17.28", appearing 8 to 9 frames.) The robot would repeat the lifting
and dropping until, after a few trials, it assumed the cause/effect relationship
(in the form of a generic anticipatory
sequence). When the robot has established the expectation that the object
will fall when the robot releases it, the robot will lose interest in the object. Note the
inferences which have been set up in the form of the anticipatory sequence after
three or four experiments.
A lower-priority, unsolved puzzle would
remain for the robot regarding why the object hadn't already fallen on the floor
but instead, was sitting on the table when the robot picked it up. What happens
next is negotiable. The robot could defer solving that puzzle and resume its
exploration of its environment. Or it could follow up, by putting the object
back on the table and observing that the object didn't fall[11].
An adult could lift the object and put it back on the table and then the robot
could try to imitate it. But how will the robot come up with imitation? It must
first "objectify" its environment. One way to carry out imitation would be to be
to put the robot through the motions, which it could then replay, as opposed to
expecting it to try to imitate someone else. Another way might be through the
parallel between the robot's recognizing its manipulator and recognizing someone
else's. We might want to begin by using a manipulator that looked just like its
own. It could lift the object off the table and drop it on the floor. It would
do this several times until it established the consistency of the results. Then
it could play with the object, dropping it from different heighths and noting
the time it took to fall and the varying loudness with which it hit the floor.
It could lift the object to varying heighths and drop it on the table. It could
slide the object around on the table. It could slide the object over the edge of
the table and discover that the object dropped without the action of being held
and then released. In other words, it would begin to discover that falling is a
behavior of unsupported objects and not of its releasing the object. One
possibility might be to try to install an expert system for learning and
discovery.
Returning to
Its Exploration of the Environment:
Once the robot has abstracted its
information and has tired of playing with the object (after three or four
tries), it would go back to exploring the environment. It would probably pick up
the next object, examining it, and then going through its little object-dropping
ritual until it had established that this object also fell when unsupported.
Meanwhile, the object-falling animation tracks would be stored with each test
object as characteristics of that object.
Anticipatory Sequences Leading to Causal
Relationships
After a few object-dropping tests, the
robot would set up an anticipatory sequence in which it would expect all objects to fall when let go or
rolled off the edge. With the passage of time, as the level of detail of each
individual track were reduced, they would be consolidated into a single generic
animation track, since they would be sufficiently similar to warrant this. As a
data compression stratagem, this animation track would be referenced by all
objects rather than storing a copy of it with each object. In the process, we would have moved from the
specific to the general.
An Expert
System for Learning and Discovery:
To define an expert system for learning
and discovery, we will ask: "How would an intelligent adult explore a totally
alien existence, given our well-trained analytical
abilities?"
If we install an expert system for
learning and discovery, we would probably want to make it modifiable by
experience.
?How Do We Partition a Film Strip into
Objects and Events without Human Intervention (Verbs and
Nouns)?:
Whoa! What the robot is going to
experience is an action sequence—a film clip. There is nothing in this to
partition the world into actions and objects—verbs and
nouns.
Answer: Following only our rules for
associating, discriminating, forgetting, and condensing to generic classes, this
partitioning should occur automatically because the object remains the same over
a number of different experiential sequences. The animation tracks would fade
into each other as the computer slowly reduced the level of detail night after
night, combining the most similar animation tracks until one or only a few
generic tracks were left.
?How do we remember when we've done
something before?
?How do we convert what would otherwise be
a featureless, continuous time track into a sequence of memorable
events?
The principal animation track consists of
the robot's location, direction of gaze, and whatever else is happening to
it.
Given a location and a direction of gaze,
the expected image is reconstructable. However, if anything changes, we remember
the scene as it was before the change, together with the scene as it appeared
after the change. Also, if there were any unique sounds or other happenings, we
will remember the scene and the surrounding circumstances. For example, most
people remember where they were and what they were doing when they heard that
JFK had been shot.
Objects, sounds, smells, tastes, and
above all, events, can trigger recollections of particular action sequences when
we "went so-and-so and did such-and-such".
?How about this hypothesis? We remember
the unexpected. We also attach weights to what we remember. If we glance at
something or note its existence out of the corner of an eye, we don't tie it to
the day's animation track (or we do so with such a low weight that it is soon
forgotten). If in the process, we absorb a new level of detail, we may remember
the detail without remembering when we saw it. However, if something unexpected
happens, then we tend to associate the object with the action sequence and to
attach a higher weight to the association, remembering it
better.
We remember the animation track
surrounding an emotionally charged event. I remember the night I spent in the
hospital after my tonsillectomy. I remember the events surrounding the time I
had laughing gas.
The learning of relationships is
independent of remembering the animation tracks at the times when they were
learned.
We have the ability to project trends.
For example, if something is slowing down or speeding up, we will project a
continuation of this trend. Of course, we have to be able to abstract the
general concept of "slowing down" or "speeding up".
Brief generic animation tracks, like
opening a drawer, pouring water, and all the other 1,001 common micro-moves we
make each day, would be stored like attributes in a generic file.
•
Must segment the continuous flow of time into events that can be
recognized like objects. (No two events are identical, so some criteria for
similarities must be established.) May use objects, sounds, other cues to
trigger recollections of similar events.
•
Need to generate anticipatory sequences or animation
tracks.
Segmenting
the Time Stream into Recognizable Events:
Need to distill action sequences down to
highlight events to illuminate causal relationships.
Establishing the Concept of
"Objects"
Meanwhile, the objects in the animation
tracks would remain the same even when the animation tracks were sizably
different. Consequently, the objects have a time-independent existence.
Gradually, a model of external objects and of a world that has a
time-independent reality would emerge. Or would it? Probably so. The shell
model, including the objects, would end up as an invariant part of the generic
action track or tracks which involved that room. If the objects and events within action sequences were stored
independently of the sequences, then cross-linkages to other objects and events
would develop that were based upon, for instance, the objects' similar
silhouettes or other distinguishing features.
Hanging on to Objects
?Also, how will the robot learn to hang on
to the objects it picks up?
Answer: Could be pre-programmed or could
be learned by trial and error.
Impenetrability of objects is another
generalization to be made. The robot has to learn to associate its being stopped
with the object that is stopping it. Again, the process of generalization should
take place until the robot associates impenetrability with all objects. (Imagine
what a shock it's going to be when the robot first encounters a liquid!) The
robot's state of confidence in its assumptions about the world is going to be
sorely tried for a long time. Its gut-level self-evaluation is going to be
impacted by these unexpected discoveries. As a part of its modeling of the
world, such unwelcome surprises should cause the robot to become more cautious
and skeptical for a while, and to test its assumptions more extensively than
before. This caution will soon taper off, but will rise again after each
unexpected challenge to familiar assumptions.
?Does this phenomenon explain a child's
uncertain grip on reality? Eventually, by the time we grow up, we learn enough
about the world that we aren't so often surprised in such fundamental
ways.
When the
Robot Experiences Water:
?What will our baby robot make of water?
It has no specific shape and no specific color at all. Its properties will be
utterly unlike those of the solid objects for which we have designed the shape
tables and object identification mechanisms.
Tool-Using, Invention:
Also, how will the robot learn to use
cup-shaped objects in lieu of actual cups, the way a human or a primate might
improvise? How will it grasp the concept of providing a cup-shaped object to
hold water?
Ancillary
Lessons to be Learned:
The robot would learn other lessons from
this experience. It should learn that it can make things
happen.
It should learn the correlations among
touch, sound and sight.
It should learn the concept of solidity.
Using anticipatory scripts, the robot
should then test the situation by trying it again until it has confirmed and
stored the phenomenon. It should try dropping new items. It will eventually
experience breakage. It should then be disappointed that it can't restore the
item and should be slow to drop new items.
The Robot
Encountering an Obstacle
Problem: Baby hits an obstacle on its way
to a goal and stops. Then what happens?
(1)
The baby robot has been thwarted from reaching its goal. The robot should
instinctively dislike being held captive and unable to reach its
goal.
(2)
The robot should have a short attention span.
(3)
How hard the robot continues to try to reach its goal should depend upon
the relative importance of its goal.
(4) The stratagem of moving in lateral
directions could be programmed in. Or could train the robot by setting up
oblique barriers. Could arrange on the front of the robot a mechanical steering
arrangement that would turn the steering wheels parallel to a
barrier.
(5) Once the robot has reached its
objective by steering around a barrier, it could use a trial-and-error +
computation algorithm to steer clear of the barrier, allowing adequate
clearance.
The robot could first make redoubled
efforts to reach its goal. The obstacle might prove to be moveable. If so, the
robot will remember this strategy the next time it encounters an obstacle. If
not, the robot could lose interest in its unattainable goal and could seek one
of its alternate goals. Then when the robot starts to experience the same thing
again, it could go through the sequence more rapidly. Next, it could anticipate
its blockage and avoid the obstacle, going to the alternate goal, leaving sooner
and progressing to the main goal. Next, it could skip the alternate goal and
proceed directly to the main goal. It could also develop an anticipatory
strategy for similar situations that could apply to other, more abstract types
of obstacles. (Could be tied to the original situation through the common
feeling—it feels frustrated by this situation just as it did with the physical
obstacle, though it seems hard to extend the concept of circumnavigation to
non-physical situations.)
(6) Or how about:
a. anticipating the barrier and wincing
just before it hits the barrier the next time; (pain and frustration will be
remembered a little better and earlier with each reinforcement.) On the other
hand, it mustn't generalize immediately. How rapidly it generalizes will depend
upon the trauma associated with the unpleasant event.
b. anticipating the barrier a little
earlier the third time this happens; and taking evasive action when it realizes
that the barrier is there. The evasive action would consist of avoiding the
barrier—circling the barrier at a safe distance. .
c. eventually anticipating the barrier from
the beginning and moving in a way which will avoid (circumnavigate) the
barrier.
d. testing a time or two to see whether
things have changed.
Abstracting
the Property of "Obstructiveness"
As classes of objects are developed, a
property such as obstructiveness would gradually be developed and obstructive
objects in this class would be so recognized, along with the circumstances which
under which they obstruct. There must be correlations, followed by testing and
discriminations.
?How Do We Learn about
Adjectives?
How will we remember adjectives? Develop
concepts like "hardness", "softness", "light", "heavy"? Perhaps by
anticipation.
Sounds:
Once a sound has become familiar, we
become comfortable with it even if we don't know what it is (unless it's
something we deem harmful or ominous).
With sounds, as with everything else, we
abstract larger and larger patterns.
Recognizing timbre and unique voiceprint
might be at the lowest level above speech recognition
itself.
Recognizing accents and speech
styles—e.g., whiny, bubbly, staccato—might be the next level
up.
Recognizing someone's pet phrases and
expressions entails a high level of verbal analysis.
The highest levels of speech presuppose a
general knowledge of the world.
Higher-Level Reasoning:
?How do we go about solving problems and
inventing solutions?
For example, how does the robot grasp the
idea of using a concave shape to hold water? It already knows the concept of
gravity and that water will fall from prior experience. It also knows that as
long as an object is supported by something, it won't fall. The robot can see
that the water in a glass of water isn't falling. But how does the robot's
little mind generalize to the idea that water must be cupped to keep it from
falling down?
Idea: The robot might pick up the glass
of water and move it around. Then since other glasses are interchangeable with
the given glass, and since other things that are shaped like a glass may be
included in the generic classification called "glasses", it might be that the
robot would expect that water could be held up by anything that is classified as
a glass. However, this doesn't really account for the mentation thath says "I've
got a problem. How do I solve it?", and then proceeds to invent a
solution.
We would like something more than a
trial-and-error discovery that cup-shaped things hold water. We would like the
realization that liquids must be held in containers, and then the insight that
says, "Hey! If I use a cup-shaped container, it ought to hold water!"
The robot is building a world
model.
Purpose enters in here. The idea of
trying to create a tool
Concavity is not a vary obvious common
property. But what's really in order is observing the property and behavior of
water and then
The robot might play with the water. It
might tip the water in the course of examining it and might observe that the
water fell down. Then through repeated trials, it might observe that the water
spilled out and fell down when it was tilted just beyond the edge of the
container. It might shake the container and cause the water to be spilled out of
it. It might—and here's where we get into invention—pour the water into another
container and observe that it was no longer in the original container. (One of
the lessons it would have to learn would be that after the water spilled out of
the first container, it was no longer there.)
Before we deal with invention, we must
learn verbs, adjectives, and adverbs.
?How would the robot learn its colors?
We would show it many different red
objects while saying the word "red". The robot would have to determine that what
all of the objects had in common was "redness".
The robot could be trained by guiding it
in pointing to red objects and then letting it find and point to red objects on
its own.
We wouldn't want to cross correlate all
red objects with each other. This means that there must an attribute of redness
that exists independently of any given object. Otherwise, we would have to
cross-correlate "redness" among all the red objects. (In a way, we'll be doing
that, in the sense that we'll have pointers from every red-colored object or
feature to a "red" attribute stored only once for each remembered shade of red.
To a certain extent, there may be pointers from the "red" attribute back to the
red objects.[12])
It follows that there will be entities other than unique objects and unique
events in the database. Generic objects and generic events may also be stored
like these attributes, with two-way pointers back to unique objects and unique
events. Here, we may want to allow pointers back to all the objects and events
themselves. After all, this would only double the number of required pointers.
The pointers will have weights attached to them that will designate the strength
of the association and that will gradually be reduced over time. We might want
to use four bytes for the pointers to allow up to 4,294,967,296 table entries.
(Three bytes would give us 16,777,216 entries in each table or file and would
probably be sufficient.) "Red" might include a very approximate range of RGB
values and the word "red" in text and spoken English. Or one might use pointers
to the word "red" in the OCR file and the sound bit of the word "red" in the
speech recognition file. With each shade of red, we will need to store the RGB
values (or alternatively, the chrominance values) that define
it.
Colors, like most other attributes, are
human inventions[13].
The color spectrum is continuous. There is no such thing as the color "red".
"Red" is an arbitrary abstraction enforced by language. Furthermore, there are
various subdivisions of "red" such as "carmine", "scarlet", "crimson",
"brick-red" (whatever color that is), and so forth. And this is true in general,
from colors through numbers to events. (Identifying colors will somewhat
facilitated by the human propensity to print the primary colors rather than
borderline colors (which can be handled with appellations such as
"yellow-green"). There will be a hierarchy of colors
Identifying objects by an attribute such
as color is tantamount to functional inversion. Given a function, find its
inverse.
The robot must respond to "no!" and to
scolding. the protean adaptability
of the human mind.
Need to imitate
humans.
What We
Remember and What We Don't:
Note that unique experiences or events
are remembered, like the midnight hike with Mr. Drew, or Ruth and I climbing the
mountain at Estes Park. On the other hand, routine action sequences in the same
setting are soon forgotten but the setting itself is
well-remembered.
Sample of Storage Requirements: Wood grain finish, walnut. Width; depth;
height; shape, with corner radius; assembling-it animation tracks; easily
nicked; Sullivan Industries; slide-out drawers, dark back in drawers; memories
of using it at 101 Lake Shore Blvd.; recollections of moving it out for
cleaning, for access to cords; Christmas present from
Tommie.
Strategy: Will remember the first time
(Tommie helping me put it together), the unusual. As something is repeated, the
strengths of the linkages and of the now-generic memories ought to increase but
the animation tracks that led to it will not be stored. Links to memories of
work done on the computer such as the house ads, the house floor plan, papers
sent to ISD, etc.
Short-term memory and long-term
memory.
Certain objects and events will be
members of more than one class. For example, a cubical house will have pointers
to, and to a lesser extent, from both a generic house and a cube. Ice cubes
would also have a pointer to a cube and to ice and to cold and to ice trays and
refrigerators and to the experiences of getting ice cubes out of the ice trays
and of putting water into the ice trays (plus, probably, experiences featuring
the spilling of the water on the way to the freezer).
The range of parameters from multiple
instances of objects would probably be used to establish the range of parameters
of the generic object. If something fell within that range or, perhaps, within a
Guassian s or two of that range, it would be recognized as a member of that
class. Otherwise, it would either fall into another class or would establish a
new class. Actually, you'd probably want to use a recognition score or, perhaps,
say that if an object satisfied one or a few definitive criteria, it would be
included. Where there were ambiguity, closer inspection would be suggested
(i.e., the recognition problem would be raised to the level of conscious
awareness), or the object would be left unidentified.
The problem of categorizations: when do
we stop categorizing a violin as a violin and begin classifying it as a viol or
a guitar? How about sets and subsets? hen do we quit classifying a car as a car
and begin calling it a truck or a forklift? All are vehicles. Shapes. Functional
definitions. Might start with broad categories and later narrow down to finer
discriminations.
Cross-linkages can be to other objects or
to sequences, which can be labeled with a number in a look-up table. This would
reduce the bit count for cross-references. The numbers might be assigned in
chronological order, after checking to insure that each given item isn't already
in the data base.
Could search in background mode. Could
think (correlate and differentiate) in background mode.
A key problem is that of abstraction.
•
Fuzzy recollections and modeling must be essential to recognition. That
could be a reason why we don't remember most things at all
exactly.
•
Can remember at varying levels of abbreviation.
•
Quasi-randomness would be essential to improvement. Motor skills require
feedback, and variations in approach would allow evolutionary
improvement.
•
Non-quantitative. Note that visual recollections are very approximate.
Abstraction is somehow visual and might be such a thing as
"boltedness".
•
Can be quantitatively emulated, although the brain probably doesn't do
things quantitatively. This analog way of remembering may extend to all kinds of
memory, including aural memory.
•
Remembering invokes a dendritic structure of associated memories. Not
remembering requires inhibition of these associated
memories.
•
A number of instances of a given object are stored.
•
Abbreviated scripts. Everything is based on actions. Feelings are stored
with objects.
•
Can remember at varying levels of abbreviation.
•
Faces, Must abstract at varying levels of resolution. Silhouettes are
abstracted (can recognize from silhouettes). Can identify images in
pictures.
•
Problem-solving could take the form of trial-and-error and selecting a
successful outcome.
•
?How do we
generalize?
10/6/95:
•
The subject of abstraction is so crucial. We store such a small
fraction of what we see and what we do store is so dependent upon our intent to
store.
•
Memory and
Recognition:
We store the exceptional, the unusual
detail. But this makes it hard to generate a general-purpose taxonomy. On the
other hand, if we store related examples, then the unusual details would
establish the envelope.
Will certainly need to use model-based
encoding with crude animation and, perhaps, rendering. Whether or not
experiences are remembered is determined by what's going on inside and not
directly by what's happening in the external world.
I choose not to remember the
start-to-finish "video tape" of my visit to Nobie Stone. Instead, I extract
excerpts from it at selected times when something special happened. There are
"hot links" to Sunday-School, SSL in 1965, and other Nobie events. There are
links between various events and Nobie's name, Nobie's face, Nobie's voice, and
all the locales where I have encountered him. Nobie's voice is stored not as
actual words but as a certain pitch and a style of diction, together with images
of his face while speaking (seen from various viewpoints). The most vivid image
is that of him speaking in Sunday-School class.
I can remember thoughts that I have had
without necessarily remembering when I have had them. Last night, when John
Stephens brought up a cooking anecdote, it triggered my cake-baking anecdote.
I had to understand (abstract) the
meaning of his conversation before I could make the connection.
Will certainly want to weight our
recollections and relationships to recollections, perhaps on the basis of
frequency, intensity (trauma), and perceived importance.
Will need to store action (animation)
sequences. These may help establish cause and effect relationships (push this,
and that happens). Understanding of relationships and sequences will be
necessary. Action sequences will be particularly keyed to our own actions.
Certain activities such as locomotion and
navigation should be handled subconsciously.
Storage: We will probably need at least a
40-bit address space (might get by with 32 bits for a while). Might use local
directories for related material. Will probably want to continually prune and
optimize. Could use 16-bit precision for absolute size.
Can recognize better with high
precision.
Could use Gaussian error functions to
recognize, but we're really interested in trigger points where flags are
raised.
Might have a size factor, a point of
origin, and 8-bit dimensions.
Might have a size factor associated with
each dimension.
Might use a variable resolution size
factor.
"Storing at
Full Detail":
"Storing
at full detail" needs some elaboration. The robot will be examining the object
at a 5 frame per second update rate. If the robot could examine the object at
maximum effectiveness, it could record about 56,500 pixels/second of 2° central
vision detail or about 500,000 pixels of full-60°-field-of-view visual data.
However, it would seem reasonable to permit the robot to store only a very
limited degree of detail in a 1/30th second snapshot. To remember greater
detail, more extensive study of the object would be required.(For recognition
purposes, details must be stored at a level of detail which is hugely simpler
than that of the photographic level.) Normally, we would store the
representation of an object using a generic texture, coupled with exceptions
from uniformity. With 20:1 wavelet compression, using only 256 colors, storage
rates would be no greater than 100,000 bytes/minute or about 6 MB/hour. At that
rate, a 2 GB disk would store 320 hours or 20 days ( 3 weeks) of observations. A
140 GB tape drive could handle about 1,500 days or four years. However, details
could rapidly be degraded. They could fade rapidly at first and then more slowly
later.
Training
Robots in Virtual Environments:
Given a sufficiently realistic virtual
environment within a computer, the robot might learn its way around by
experiencing a simulated environment within a computer before it were presented
with the real world. This would require a very realistic simulation of
reality.
We might imagine a computer simulation in
which the AI program learns to control its simulated manipulator. All the
software that is needed carry out such a process could be defined and perhaps
even created.
Memory
Requirements for a Virtual Environment
Suppose a 400 sq. ft. room texture-mapped
at 200 dots-per-inch. In addition to the floor area, there would be 80' of walls
covered up to, perhaps, 5' for a total of 400 sq. ft. + the sides and surfaces
of objects in the room for a total of, perhaps, 1000 sq. ft. or 144,000 sq. in.
at 40,000 dots/sq. in. This would require about 6 GB if we stored 1 byte per
pixel. However, if we assume a wavelet-based 10:1 image compression ratio, we
might be able to store such scenery in 600 MB. The weight, center of gravity and
moments of inertia, surface "feel", and other characteristics would have to be
associated with each object. At that rate, we could store, perhaps, 1 sq.
ft./MB. Then on a 9 GB drive, we could hold about 9,000 sq. ft. At a resolution
of 32 dots-per-inch (1,000 dots/sq in.,150,000 dots/sq. ft.), we could store
600,000 sq. ft.,
?Why does a baby love repetition?
Learning of motor skills? Concept formation?
•
Current State of the Art
166 MHz Pentium, 200 MHz P6 available.
High-density (4.7 gigabyte) CDs coming in late '96. 8 MB RAM, 500 MB disk, 2X CD
ROM, 75 MHz Pentium at bottom ($1,200) end. 16 MB RAM, 1 GB disk, 4X CD ROM, 120
MHz Pentium for journeyman system.
We could currently afford approximately 9
GB of disk storage ($2,100), 225
SPECint92s of processor speed (a 150 MHz 604e or a 150 MHz P6, $3,000),
and 64 MB of RAM ($2,100). Digital signal processors could up the ante to,
perhaps, 2 Gigops of processing speed. A 140 GB Exabyte tape drive is available
for $5,000.
Given a $60,000 grant, we might spring
for 100 GB of disk storage (11 drives, $20,000), 10-20 Gigops of processing
speed, and, perhaps, 0.5 GB of RAM.
Could use a 4-processor Daystar Gemini
system. Or a 4-processor P6-based
system. Or even two of them.
•
December, '96, State of the Art
180 Mhz Pentium. 264 MHz P6? $15/MB RAM?
4.7 GB CDs, 2.4 MB/second? 15 GB hard drives? 6X CDs?
For $7,500, could afford a 180 MHz
Pentium or, perhaps, a 264 MHz P6,
128 MB of RAM, 15 GB of disk, and 4.7 GB of CDs.
• December, '97 State of the
Art
300 MHz P7? $8/MB RAM? 9.4 GB, 4
MB/second CDs? 9 GB hard drives?
• December,
'98:
400 MHz P7, 133 MHz bus, 300 MHz P6,
$4/MB RAM? 18 GB CDs? 30 GB hard drives?
• December,
'99:
500 MHz P7, 166 MHz bus, $2/MB RAM? 18 GB
CDs?, 30 GB hard drives?
• December,
2000:
600 MHz P8, 200 MHz bus; $1/MB RAM?, 36
GB CDs?, 90 GB hard drives?
•
Year 2000 State of the Art:
For $7,500:
CPU: 2,000 SPECint92s, 8,000 to 32,000
SPECs for native signal processing (NSP),
Disk: 90 GB
RAM: 1 GB
For $75,000:
CPU: 16,000 SPECint92s (up to 256,000
SPECint92s in NSP mode)
Disk: 1 TB (11 drives),
RAM: 10 GB
For $500,000:
CPU: 200,000 SPECint 92s (100-200
processors), up to 3.2 terops in NSP mode.
Disk: 5 TB
RAM: 100 GB
•
December, 2002 (actual):
2.4 GHz Athlon, 1 GB RAM, 120 GB hard drive, 4.7 GB DVD, 10
gigaflops
For
$7,500:
CPUs: 80 gigaflops
Disk: 2 TB
RAM: 8 GB
For $75,000
CPUs: 800 gigaflops
Disk: 20 TB
RAM; 80 gigabytes
For
$500,000:
CPUs: 6 teraflops
Disk: 150 TB
RAM: 500
gigabytes
• Year 2005 State of the
Art:
For $7,500:
CPU: 10,000 SPECint92s (25
Gigops)
Disk: 200 GB of disk,
RAM: 5 GB
For $75,000:
CPU: 100,000 SPECs (250
Gigops)
Disk: 2 TB
RAM: 40 GB
For $500,000:
CPU: 1 terops
Disk: 10 TB
RAM: 0.5 TB
This would approach human processing
parameters.
• Ultimate (Conservative) State of the
Art, as seen from 1995:
Assume 4 GB RAM chips. 10 GHz clock
speeds. 10 GB/sq. in. disk densities.
Assume $100/GB, 2 Gigops processors
($20), 100 GB disks ($200). Then:
$6,000 would buy 1 TB of disk, 200 Gigops
of CPU, and 20 GB of RAM.
$12,000: 2 TB of disk, 0.5 terops, and 40
GB of RAM .
This would correspond to about the year
2010 and leaves us down by a factor of 20 in speed. However, digital signal
processors could conceivably boost speeds to 5 terops, or even 10 terops in
volume production (25-50 chips @200 Gigops/chip).
$600,000 in 2010 should buy 100 TB of
disk, 20 terops of processing power, and 2 terabytes of RAM. This should provide
enough raw processing capability to permit proof of principle demonstrations of
human-class thinking irrespective of what approaches are taken. This should
afford computational resources in the general neighborhood of what the human
brain can do, albeit at high expense and with large, hot machinery. Still, if a
machine can be made as smart as a human, it can probably be made much smarter
than a human in performing arithmetic and reasoning operations, and could be
well worth the investment. Also,ways of cutting costs such as using high-volume
custom chip sets and improving software algorithms could probably help to reduce
costs.
Total Storage
Requirements:
•
During the course of a lifetime, we are awake about 6,000 hours/yr., or
540,000 hours in 90 years. At 2.25 GB/hour of compressed HDTV imagery, it would
require about 1,750,000 GBytes of storage to accommodate a lifetime of visual
memories, or about 1,750 terabytes. In actual practice, we probably store
snapshots that can be animated in imaginative ways. (We wouldn't need to store
the messages from both eyes once their 3-dimensional information has been
digested.)
•
If we stored information at a 92,000-pixel resolution[14]
instead of at a 2,000,000-pixel resolution, we would need 56 terabytes for 90
years. At this storage rate, 9 gigabytes of storage capacity would last about
six 16-hour days.
•
At the magnetic storage densities that IBM is currently targeting (10
Gb/in2.), 3.5-inch disks might store 25
GB/sheave and 5.25 in. disks could store, perhaps, 50 GB/sheave. With 5-sheave
drives, this could translate into 125 and 250 gigabytes/drive. If we accept
IBM's theoretical limit of 62.5 Gb./in2 as an upper bound on magnetic storage
densities, then storage may never exceed 1 and 2 terabytes/drive for 3.5 and
5.25 inch drives, respectively. A 2-terabyte drive could store about 900 hours
of HDTV.
• If we stored a frame in 10 KB, we could store 100,000 frames/GB. It would require 5.5 GB to support 1 frame/hour for ninety years. More aptly, we will be storing 3-D shell models of our mostly-familiar surroundings at very low resolutions with a minimum of remembered detail, together with animation sequences and many, many cross-references. Using model-based encoding, and degrading old memories to lower and lower resolutions, a few gigabytes might be sufficient (not that it would have to be). Most of what we experience is the same old same old, and needn't be stored without much of any information except pointers to a few generic scenes. The details of what we do each day are soon forgotten.
Dividing up the
Task:
Might use 4 visual µprocessors, 2 for
each eye. Might use additional µprocessors for 3-D imaging, clipping, texture
mapping, Gourad shading, and Z-buffering.
What we will need:
A speech recognition package that
can be embedded in the computer system. Ideally, would like the Speech Systems,
Inc., Phonetic Engine 500.
A state of the art OCR package—either OCR
Professional 6.0 or Accutext.
A state of the art voice synthesis
package.
A facial recognition
program.
A 3-D graphics program that can generate
2-D views.
Monitoring
Equipment:
Could use a VCR storing the reduced
resolution image that the computer is generating. Could use a computer monitor
presenting the 3-D model that is being constructed within the robot. Would be
interested in the interrelationships and the abstractions that are developing
within the computer.
[1] - By the same token, like our arithmetical and logical capabilities, our speech recognition and linguistic capabilities may be Johnny-Come-Latelies on the evolutionary scene and may be a lot easier to master than such "lower-level" capabilities as vision and walking.
[2] - One approach to this might be to provide an Internet publication forum in which different contributors can gain recognition for their contributions, including patent rights, where applicable. Such an activity would require that someone be an honest broker. It might also require special security features and online access to a patent library. If you would be interested in such a role or have ideas about how cooperation might be implemented, I would welcome hearing from you.
[3] - Hans, Moravec, "The Universal Robot", Analog Science Fiction adn Fact, Jan., 1992, p.93-101.
[4] - At Case Institute of Technology, my mathematician office-mate and I were fascinated with the Perceptron concept when we first heard about it in 1958. We speculated about how it might work, but then heard no more about it.
[5] - When I was working as a graduate student at Case Institute of Technology, our project mathematician had proven that it was impossible to apply two-terminal-pair analysis to networks of probabilistic switches. As we began to prepare our final report, two of our staff members began to try to apply two-terminal-pair analysis to probabilistic switch nets. I was resentful of their wasting their time on this wild goose chase after Dr. Lehman had already proven that it couldn't be done. In a few days, they came up with correction formulae that allowed two-terminal-pair transformations even though, technically-speaking, it couldn't be done. I learned a valuable lesson about keeping an open mind in the research game from that experience.
It's also noteworthy that, by 1895, America's leading astronomer, Dr. Simon Newcomb, had mathematically proven that heavier-than-air flight would forever be impossible. He publicly announced this in 1903, the year of the Wright Brothers' first flight at Kitty Hawk. In 1911, he announced that it would forever be impossible for an aircraft to carry a passenger. That happened to be the year of the first passenger flight. His timing wasn't very good.
I guess we all need a healthy disrespect for authority.
[6] - These numbers are far from a consensus. A 1995 book estimates the number of neurons at 50,000,000.
[7] - Dr. Moravec is responsible for the 1013 operations per second lower bound, Danny Hillis of Thinking Machines has authored the 1016 operations per second estimate, Dr. Terry Sejnowski of the Salk Institute has published an estimate of 1015 operations per second. The 1,000:1 range among these estimates may partially arise from differences in
[8] - Note that we're comparing apples and oranges. We may still be seriously underestimating what an individual neuron can do.
[9] - A 150 MHZ pentium chip is rated at 180 MIPS, while a 167 MHz P6 chip is projected to yield 240 MIPS. The P6 transaction bus is designed for efficient 4-chip operation, delivering 1,000 MIPS with four machines. A 133 MHz 604 chip is quoted at 200 MIPS. Extrapolating these parameters to the state-of-the-art speed of 300 MHz, these chips would provide processing speeds of about 450 MIPS. DEC's follow-on chip is projected to operate at 400 MHz, providing processing speeds in excess of 500 iSPECS. (DEC has pledged to 500-fold its microprocessor speeds over the next 20 years, and current forecasts call for single-microprocessor PC speeds of 3,000 to 15,000 SPECint92s by 2005, with multiple digital signal processors running at possibly 100 billion operations per second by that date.)
[10] - IBM is reputedly planning a 20-fold "bump-up" in disk drive densities, projecting a 90 gigabyte disk drive for PCs by the year 2000. This would up single-disk-drive capacities to about 200 GB within the next few years (2003?), and would lower tha cost of a terabyte to about $10,000. However, IBM has warned that magnetic domains smaller than about 0.1 µ begin to exhibit quantum leakage into other domains. This would correspond to a bit density of about 62.5 billion bits per sq. inch or about 0.2 terabytes per 3.5" disk sheave. One might hope for ultimate magnetic disk capacities of 1 terabyte for 3.5" disks or 2 terabytes per 5.25" drive—still well below the presumed storage demands of human intelligence. (We note that the Japanese have begun work on a one terabyte optical storage system, and a number of other schemes are being touted for multi-terabyte optical disks. For example, the resolutions advertised for IBM's scanning woud support 10 to 20 terabyte 3.5" disk drives.)
[11]- There are several hidden assumptions here, such as the idea that the robot knows that it can cause things to happen, that it will not try to lift the object up through the table, and
[12] - An alternative way to handle the reverse reference would be to search the object- database for red objects.
[13] - Cross-cultural studies of color naming around the world suggests that color is not arbitrary but is perhaps a function of the promary colors that are registered by the human eye.
[14] - It is assumed that the stereo information from both eyes has been digested and that visual information is stored in a 3-D shell model format.
[RNS1]Donna Baker: 726-2737, 1945, 1939, 722-311; Fax: 726-2630
Angie Buckeley: 544-0054
Belser Dasarthy: 922-9230, ext. 355
TI C80X - 4 adv. DSP's on chip, 40 MHz, 5.4 w., 2 Gops, $579
Model-based video encoding; head pose estimation; Feature point tracking