This effort represents an attempt to create a human surrogate, including emotions, drives, self-awareness, a set of moral/ethical principles, and a sense of humor. Although this model will require extensive pre-programming, it is not to be programmed to do anything specific except to learn and then to live. It is to be driven by "urges", moderated by the interplay of conflicting influences within its developing "ego" and "personality".
The Magnitude of the Task
These are very ambitious—even grandiose—goals. The reason for beginning with a human rather than an animal model of intelligence is that
(1) it is difficult to know what traits are essential and what traits are discretionary, and
(2) I can try to delineate the mechanisms that allow human beings to do what we do but I can't guess comparably well at how (simpler) animal minds operate.
Reasons for Optimism
I am heartened by the progress being made in emulating on desktop PCs such uniquely human capabilities as speech, cursive handwriting, and optical character recognition, speech synthesis, machine vision, bipedal locomotion, and natural language processing. We can't match human capabilities in these areas yet but we're getting closer. I am also heartened by the progress which has been made in defining neural functions and in conceptually devising ways of performing them on computers. Of course, the proof of the pudding will be in the eating.
This paper contains a few ideas about what the mind appears to do, coupled with ideas about how the same or similar functions might be emulated in silicon.
A preliminary analysis of mental processes over the last few months has led me to the idea that the mind appears to be exceedingly complex. It appears to me that there is a rich abundance of mechanisms that make us what we are, as opposed to one or a few underlying principles.
Emulating the human mind is a "grand challenge" and a fascinating topic. It may be that we will not achieve the thousandth part of what we are attempting. It may be decades or centuries before we are successful at artificial intelligence. However, the process might yield a lot of useful information, including, if nothing else, a measure of just how difficult this problem is. Even if we are only able to achieve insectile intelligence in a computer, there should be many commercial applications
Some of the best minds on six continents are working on this problem or upon aspects of it, and I have begun to draw upon these efforts. It is my hope that it will be possible to identify an interest group to pursue these topics and to integrate what is already available or potentially available. It should be a highly interesting project. (If you would be interested in one or more aspects of this project, or in being kept up-to-date on its progress, I would welcome your inputs or interest.)
A Brief Review of Artificial Intelligence Research
The seeds of artificial intelligence research were sown during World War II when such electronic computers as the ENIAC, the EDVAC, the Aiken Mark I, the Differential Analyzer, and a number of analog shipboard fire control computers made their debut. It's not hard to see how these new "electronic brains", which could perform arithmetic and logical calculations orders of magnitude faster than human "computresses", would fire the imaginations of engineers and mathematicians. Could we develop thinking machines that could outperform humans in the mental arena, as labor saving machinery had already done in the physical domain? As Dr. Hans Moravec put it,
"One line of research, called Cybernetics, used analog circuitry to produce machines that could recognize simple patterns, and turtle-like robots that found their way to lighted recharging hutches. An entirely different approach, named Artificial Intelligence (AI) attempted to duplicate rational human thought in large computers."
The brain was regarded as a digital computer but, perhaps, with analog/digital circuits to accommodate control functions. In the latter 40's and early 50's, there was great fanfare regarding these prospects, together with concerns over what "technological unemployment', "automation", and the new science of cybernetics would do to humanity in a robot-run world. What would happen when we humans were no longer the smartest people we knew? An MIT mathematical prodigy, Dr. Norbert Wiener, wrote books called "Cybernetics", and "The Human Use of Human Beings", calling for responsible application of this revolutionary technology. The science fiction films of the era ("The Day the Earth Stood Still") had the robot as the master and a specially created human emissary as its loyal servant. (Remember Jack Williamson's "The Humanoids"?) The mind was considered to be a program that ran in the brain, and it was thought to be only a matter of a few short years before intelligent machines were running our factories. This was the era of the first-generation vacuum tube computers such as the UNIVAC I, and the IBM 650 and 701. It was also the era of patch boards and analog computers. Feedback systems and servo theory were very much in vogue.
Not to be ignored throughout the whole period from the forties to the nineties were the continuing studies of the brain by neurology researchers. These tended to proceed largely, though not completely, independently of artificial intelligence research.
The Fifties and Sixties:
This rampant optimism persisted throughout the 50's and well into the 60's. In 1959, a Cornell Aeronautical Laboratories psychologist by the name of Dr. Frank Rosenblatt developed the first artificial neuron-based computer, called "The Perceptron". It was a 500-neuron, single-layer neural network and was attached to a 400-photocell optical array. Another major milestone occurred that year when Simon and Newell developed a theorem-proving program called "Logic Theorist" which was able to prove a number of mathematical theorems. Checkers playing programs, algebraic manipulation programs (including symbolic integration and differential equation solving), language translation, natural language processing, and () were all under development during this time. OCR-A and OCR-B typing balls were offered for IBM Selectric typewriters, and optical character recognition systems were available to read text printed in those fonts. Simple wire-following robots that any radio amateur could build were devised and described in Scientific American. As one writer has put it, this was a period of "initial intoxication with cognitive science". (As we shall see in the Section below concerning the capabilities of the brain, the computers of the 50's were ludicrously slow and small, by a factor of at least 1,000,000 and perhaps closer to 1,000,000,000,000, for the implementation of human-caliber intelligence.)
In the early sixties, the U.S. Postal Service mounted a major effort to develop optical character recognition hardware and software. (The program was oversold at the time but by now, it has led to advanced optical character recognition equipment that is in daily use by the Postal Service.) Also, in the early sixties, Simon and Newell created the General Problem Solver (GPS) as a generalized theorem proving system. Throughout the sixties, there was a ferment of activities in all areas of artificial intelligence. (Digital-to-analog converters probably weren't fast enough in the sixties to do much with machine vision.)
However, by the end of the decade, the Postal Service had discovered how difficult it was to build a machine that could read addresses on letters. IBM had thrown in the towel on their Russian language translation program when it became apparent that a computer couldn't translate language without understanding it. And computers were too slow by many orders of magnitude for machine vision, virtual reality, and speech and handwriting recognition. While they could perform arithmetic and logical manipulations with great proficiency, they were light-years away from posing their own problems or understanding the real world, let alone handling the subtle nuances of interpersonal relationships.
In 1969, Drs. Marvin Minsky and Seymour Papert of MIT published a book entitled "Perceptrons" in which they proved that single layer perceptron networks were, among other limitations, inherently incapable of performing the exclusive OR function, and were a dead end. One would think that their arguments would have been insupportable. After all, the human brain is a neural network of incredible complexity, containing tens of billions of neurons and hundreds of trillions of synapses. But for some reason, they were sufficient to derail neural network research for 15 years. (The authors would later explain that neural networks were competitors for research money.) Such is the power of scientific snobbery.
In the early 70's, researchers at Stanford and MIT began mounting TV cameras and manipulators on wheeled robotic carts and turning them loose in real-world environments. To quote Dr. Moravec again,
"What a shock! While the pure reasoning programs did their jobs about as well and about as fast as a college freshman, the best robot control programs took hours to find and pick up a few blocks on a table, and often failed completely, a performance much worse than a six-month old child. This disparity, between programs that reason and programs that perceive and act holds to this day. At Carnegie Mellon University there are two desk-sized computers that can play chess at grandmaster level, within the top 100 players in the world, when given their moves on a keyboard. But present-day robotics could produce only a complex and unreliable machine for finding and moving normal chess pieces.
"In hindsight, it seems that, in an absolute sense, reasoning is much easier than perceiving and acting—a position not hard to rationalize in evolutionary terms. The survival of human beings and their ancestors has depended for hundreds of millions of years on seeing and moving in the physical world, and in that competition, large parts of their brains have become efficiently organized for the task, but we didn't appreciate this monumental skill because it is shared by every human being, and most animals—i. e., it is commonplace. On the other hand, rational thinking, as in chess, is a newly acquired skill, perhaps less than one hundred thousand years old. The parts of our brains devoted to it are not well organized, and, in an absolute sense, we're not very good at it. But until recently, we had no competition to show us up."1
Image enhancement was a popular topic in the 70's in support of DoD and NASA satellite image analysis and JPL's successes with Voyager photographs. Intel introduced the first microprocessor chip: the 8008.
Another False Start for AI
In the mid-80's, artificial intelligence enjoyed another false dawn. This time, it was rule-based expert systems, tree searches, and Symbolics Computers. Expert systems proved hostage to the intuition that so often guides human beings and that depends upon an overall understanding of the world. Also, it took too long to enter all the rules into a computer program. Expert systems still exist but they don't replace experts. Symbolics Computer Systems soon declared bankruptcy.
Slow Patient Progress Behind the Scenes
In the meantime, slow, patient progress was underway. Machine vision systems began to be used for assembly line inspections. Unimation's "Puma" robotic arms were installed to carry out repetitive assembly line functions. Cheap embedded microprocessor chips were becoming faster and faster. The rapidly rising capabilities of personal computers permitted rapid programming of sophisticated software. Caere's Omnipage Professional became an increasingly robust optical character recognition program. Video games became ever more realistic. Though initially very expensive, trail-blazing speech recognition systems were developed by Bell Labs, and by many universities and small companies.
The Resurrection of Neural Networks and Fuzzy Logic
During the 80's, a few "keepers of the flame" had devised multi-layer neural networks that circumvented the limitations described by Minsky and Papert. Fuzzy logic and genetic programming were added to neural networks, which were embraced with great enthusiasm by the Japanese. Various kinds of multi-layer neural networks with back-propagation and sometimes, fuzzy logic, are proving to possess fascinating and highly useful capabilities in the areas of pattern recognition and control. The latest release (6.0) of the Omnipage optical character recognition package incorporates a neural network to help recognize printed text. There is a great ferment of activity in this now-highly-fashionable area of research.
Neural networks and fuzzy logic are hot!!
Genetic programming seems to be receiving less attention.
Speech Recognition in 1990:
Computer control by voice command became available in the early 90's: Dragon Systems for IBM-compatibles and Voice Navigator for the Macintosh. These early speech recognition systems were speaker-dependent and had vocabularies of a few hundred words, spoken one at a time. (In 1991, AT&T had a laboratory system capable of recognizing continuous speech, but it required16 parallel, 32-bit digital signal processors.)
Speech Recognition in 1995:
In 1995, IBM began offering a voice dictation package with their latest PCs. The IBM system is context sensitive and can distinguish among homonyms. Dragon Systems recently introduced a 120,000-word, discrete-word speech recognition system called DragonDictate, while Apple Computer Company is bundling a speech recognition program called Voiceprint with its high-end 8500 and 9500 computers. IBM's "VoiceType" is the most accurate of the speech dictation systems and includes the ability to examine context and to distinguish among homonyms. A small company called Speech Systems, Inc., began offering the first continuous speech, speaker independent, voice-dictation system for personal computer owners in 1995. These systems aren't yet the kind of Smith Corona "Voicewriter" that you'll be able to buy from Service Merchandise sometime within the next ten or fifteen years, but they'll get there.
Other 1995 Capabilities:
Optical character recognition (OCR) has improved steadily with the Omnipage Professional series of OCR packages, coupled with 600 dot-per-inch and higher resolution scanners. Handwriting recognition has improved rapidly since Apple Computing introduced the first Newton Personal Digital Assistant in 1992. Voice synthesis systems are also improving steadily at AT&T. Facial recognition systems are under development, together with usable fingerprint identification packages. Machine vision and industrial robotics systems should be entering their heyday with cheap multi-gigops processors such as the Texas Instruments T320C80 entering the marketplace.
These are uniquely human capabilities that are not even shared with the rest of the animal kingdom. Interestingly enough, they are being realized using conventional computers running conventional software. And these programs will only improve. The introduction of MMx processing on 80X86 processors in early 1997, coupled with ever-increasing clock rates, wider data paths, and Intel's 1998 P7 processor, should afford a 5-to-40-fold jump in implementing these higher human functions.
However, it is important to distinguish between systems that perform functions upon command and self-organizing systems that give commands. What is missing from this picture are the self-aware, self-organizing, motivated characteristics of the animal kingdom, so perhaps it is in this arena that we might gainfully concentrate our efforts.
Two of the most striking areas of computer progress thus far in the 1990s are the Internet and the advances that are being made in computer graphics.
It is clear that early AI researchers hugely underestimated the computational requirements of artificial intelligence.
AI research has been hampered by "Big-Endian" and "Little-Endian" arguments about whether to concentrate on connectionist (neural net) or purely cognitive (e.g., theorem proving) approaches to achieving artificial intelligence. In reality, the two approaches will probably turn out to be complementary. It is never a good idea in research to put all one's eggs in one basket.
Sir Charles Sherrill's "Great Gray Ravelled Knot":
Incredible Complexity and Storage Capacity:
It is hard not to wax eloquent when describing the construction and the capabilities of the human brain. From a purely computational point of view, your brain may be one to two orders of magnitude faster and more complex then the upcoming 1 teraflops Cray T3E or Intel Touchstone supercomputers, or perhaps 100,000 to 1,000,000 times more elaborate than the new 200 MHz Intel P6 personal computer. Your brain contains about 50 billon to 100 billion neurons (nobody knows how many for sure), each of which interfaces with 1,000 to 100,000 other neurons through 100 trillion (1014) to 10,000 trillion (1016) synaptic junctions. Each synapse possesses a variable firing threshold that is reduced if the neuron is repeatedly activated. If we assume that the firing threshold at each synapse can assume 256 distinguishable levels, and if we suppose that there are 20,000 shared synapses per neuron (10,000 per neuron), then the total information storage capacity of the synapses in the cortex would be of the order of 500 to 1,000 (1015) terabytes. (Of course, if the brain's storage of information takes place at a molecular level, then I would be afraid to hazard a guess regarding how many bytes can be stored in the brain. One estimate has placed it at about 3.6 X 1019 bytes.)
Not bad for a 3-pound gob of pink goo!
Because of the neural-net organization of the brain and the high degree of redundancy that appears to characterize neural-net based memories, the effective storage capacity of the brain may be much less than 500 terabytes of computer memory. My considerations of our memory capacity suggest that its computer-equivalent storage size may lie closer to 500 gigabytes than to 500 terabytes. (The brain's storage capacity may primarily be used for other purposes than the retention of facts.)
Accuracy and Redundancy:
A considerable degree of redundancy in cranial memory storage may be needed to accommodate for the quantum unreliability of the brain's nanocircuitry. (The synaptic junctions are characterized by separations that are less than 100 Å.)
Complexity of Cerebral Functions
The English philosopher John Locke thought that a newborn was a "tabula rasa"—a blank slate—upon which the world wrote whatever it wrote. As recently as fifty years ago, it was thought that the cerebral cortex was a structure-less, pink pudding of identical neurons that somehow simply and magically produced human thought. The underlying cerebral hemispheres were known to have certain specialized functions—the left temporal lobe mediated speech while the occipital lobe specialized in vision—but memories seemed to be distributed throughout the brain. There was speculation concerning why 90% of all brain tissue was never used, together with the idea that some day, we might be able to learn to harness it. Today, we understand that the brain is highly structured and highly specialized. A great many functions are "wired in", compared to a digital computer which is truly a blank slate. It is these "wired in" functions that make us get out of bed in the morning rather than spend the day estivating. The 90% of brain tissue that was thought to be unused probably is used. Unlike many man-made machines which either work or don't work, certain brain functions degrade gracefully rather than abruptly as brain tissue is destroyed.
Experience with biological systems in general shows that they are exceedingly complicated, with multiple backup systems. My consideration of the functions of the mind suggests that it is also extremely complicated. The brain apparently contains a multitude of very complex and highly-specialized areas which we probably haven't yet fully mapped out and or understood.
Neurons require about a millisecond to discharge, followed by a 4 millisecond refractory period That could amount to as many as 2 X 1018 connection updates per second. In practice, the firing rate and the synaptic count probably isn't that high. There are 40 Hertz firing waves that sweep the entire brain from back to front. The 6,000,000,000 neurons in the visual cortex also fire about 40 times a second to give us our 20-frame-per-second visual update rate, so we might be looking at perhaps 240 billion firings per second in the visual cortex. Each neuron connects to a number of other neurons through dendrites and an axon (an average of 15,000 interconnections per neuron in the visual cortex), so we might be dealing with about 50 trillion (5 X 1013) synaptic junctions in the visual cortex. At 40 firings a second, the visual cortex should be able to perform about 2 quadrillion synaptic activations per second—2 X 1015 connection updates per second or 2,000,000 Gcups (giga-connection-updates-per-second—compared to 10 Gcups for current neural networks. For the brain as a whole, assuming 10,000 interconnections per neuron, the number might be about 10 times this amount, or 20,000,000 Gcups. (A recent 3/17/96 article in Parade magazine places the total synaptic count at 1.5 quadrillion and the connection update rate at 10,000,000 Gcups.)
It has been estimated that computational speeds of 109 calculations per second (1 Gigops) would be required to match the edge and motion detection capabilities of the first four layers of the human retina, and 1013 operations per second (10,000 Gigops) to 1016 operations per second (10,000,000 Gigops) would be necessary to emulate what is done in the brain overall.
The Brain Needs So Little Power
Still another impressive parameter regarding the brain is how little power it dissipates (of the order of 100 watts). By comparison, a 500-MIPS (500-Million Instructions Per Second) DEC Alpha chip dissipates about 50 watts. Using 20,000 DEC Alpha chips, we would need a 1,000 kw behemoth to achieve the lower threshold of 10,000 Gigops, or about 20,000 times as much power.
Slow Neural Transmission Speed Forces Massive Parallelism
Because of the electrochemical nature of the neuronal discharge, nervous impulses travel at no more than 25 meters per second (1 inch per millisecond) in the gray matter of the central nervous system, and at about 100 meters per second down the long, myelin-sheathed peripheral nerves. Furthermore, neurons exhibit a 1 millisecond switching time, followed by a 4 millisecond refractory period during which the neuron discharges only in the presence of a strong stimulus. This means that neurons could not fire at rates greater than 200 times a second (a 5 millisecond cycle time). In practice, their firing rates are probably significantly less than 200 Hz.
One of the implications of these numbers is that it would require 5 milliseconds for a signal to go from the eye to the occipital (plate) where visual interpretation takes place. It would then require a minimum of another 10 or 15 milliseconds for a motor signal to go from the brain to an arm or leg muscle. At least 1 millisecond would be added to the signal processing time for each neuron in the processing chain between the sensory input and the motor output. Our minimum response times to stimuli range from, perhaps, 25 milliseconds for an eye-blink to 150 milliseconds to step on a brake pedal. This means that there can't be very many neurons in the chains between the inputs and the outputs. This, in turn, forces biological nervous systems to operate almost entirely in parallel. Everything must happen at once. Visual data must be analyzed, edges detected, features extracted, objects identified, and appropriate suites of motor commands issued all in an instant—that is, within a few "clock cycles".
Generally, when a system becomes this massively parallel, it becomes computationally inefficient. A lot of resources have to be dedicated to data transfer. Also, many useful functions, such as numerical integration, are inherently serial and don't lend themselves to parallel processing.
In addition, neurons are quite noisy as well as quite unreliable, perhaps because they are so miniaturized that quantum mechanical fluctuations permit spurious firings and misfiring.
By contrast, electricity travels down properly terminated copper wires at about 85% of the speed of light, or about 250,000,000 meters per second (compared to 25 meters per second for unsheathed nerve fibers). The fastest current computers run at a 300 MHz clock rate, with still higher clock speeds on the way. This compares with the <200 Hz cycle times of neurons.Although comparisons are dangerous, it may be that a current silicon-based computer can be considered to be potentially 1,000,000 to 10,000,000 times faster than a single biological counterpart.
Relationship to Man-Made Neural Nets
As they are presently implemented, neural networks are only distant cousins of biological nervous systems. Biological neural nets are complex in various ways, exhibiting time dependent as well as spatially dependent behavior, as well as thousands of interconnections. Artificial neurons and synapses are simple approximations to biological neurons, and the jury is still out regarding how faithfully man-made neural nets replicate the behavior of even the simplest biological nervous systems. This question will be examined further in the section below on artificial neurons.
The Role of Artificial Neural Nets
Man-made neural networks shine as preprocessors interfacing with an analog world. Digital computers, on the other hand, are in their element when manipulating discrete symbols. In between, is the digital-to-analog classification stage.
Progress is being made rapidly in this technological area, and it is finally receiving the strong support and close attention that it richly deserves.
Advantages and Possibilities Offered by Neural Nets
• Neural networks are trainable and self-organizing.
• Like humans, neural networks are ill-suited to logical or arithmetic operations and are very effective at pattern recognition.
• Neural nets have the ability to generalize from experience and to measure "goodness of fit" in a solution space.
• Neural nets can team up with fuzzy logic to provide the best of both worlds.
• Neural network nodes are inherently analog devices, as are transistors. By utilizing transistors in an analog mode, it is claimed (by Dr. Carver Mead—Caltech professor and president of Synaptics—that 100:1 improvements in functional densities and 10,000:1 gains in power consumption can be realized over all-digital approaches. (Of course, one or a few transistors are still required to emulate a neural network node or synapse.)
• Neural networks can perform some pattern recognition functions much faster and more cost-effectively than digital computers.
Modelling the Brain with Neural Nets and Artificial Neurons:
The situation today with neural networks is similar to that with conventional computers in that we can use neural networks to perform specialized recognition or control functions but we don't yet know how to organize them into a highly complex and specialized "network of networks" like the brain (any more than we know how to emulate the brain on a von Neumann type of computer.)
One of our problems is that of size: we are a long way from fabricating 500 trillion or even 500 billion synapse nets. In addition, the brain uses 500,000,000,000,000 nerve fibers to interconnect the synapses, and grows new nerve fibers where desirable. Talk about a plumber's nightmare! However, we may be in a position to attempt to model very simple nervous systems, and certainly, neural networks are being used for practical applications. In emulating biological systems, one would probably take advantage of silicon switching speeds by using a bus structure to multiplex data to and from the artificial synapses and neurons. However, even if it were possible to multiplex data 10,000,000 times faster than the brain, 50,000,000 high-speed busses would still be needed to perform the task. (The higher reliability of silicon might permit a further factor-of-ten reduction, permitting us to get by with only 5,000,000 busses.)
Another current limitation is that artificial neural nets can't be scaled up to several thousand synapses per neuron; they tend to become "tangled up".
Data must be sloshing back and forth in the brain at a rate of up to 20,000,000,000,000,000 (20 quadrillion) pulses a second (Hz). Data transfer rates of 2,000,000,000 (2 billion) bytes per second might lie a little beyond the current state of the art, leaving us slower by a factor of 10,000,000:1. (Computer bus speeds of 200,000,000 Hz are planned for the year 2000. We will probably achieve year 2000 data transfer rates of 3,200,000,000 bytes per second with them by using 128-bit wide buses, or about 20 GHz—1/1,000,000th of the brain's data rates.)
Since nerve fibers are self-energizing, signals can fan out everywhere without requiring amplification. Silicon-based systems would require intermediate amplification to achieve the kind of parallel distribution exhibited by the brain.
Which Type of Neural Network Should We Use?
By now, there are many variations on the basic neural network pattern (viz., Hopfield, Hamming, back-propagation, Kohonen self-organizing feature maps, and frequency-sensitive competitive learning). Also, all of them involve various simplifications and compromises compared to biological neurons. For example, the transmission rates of neural fibers increases as their diameter increases. Transmission at the synapses depends upon the concentrations of a variety of neurotransmitters, and we probably don't yet understand why there are different kinds of neurotransmitters and how all of them play together. There are different kinds of neurons, and here again, we may not yet have identified all of the different types. Biological nervous systems are highly complex even at the cellular level. How closely do we need to model actual neurons to catch the essential features of biological nervous systems? As we will see below, Messrs. Deyong and Findley believe that the time-based, analog character of biological neurons is essential to the proper modelling of a nervous system, and that conventional "connection machines", with their limitation of time-independent synaptic weights, may throw out the baby with bath water.
Other Current Limitations of Current Neural Networks in Replicating the Brain
Although they are based upon nothing but relationships among the neurons, neural networks cannot presently model relationships. Neural networks don't lend themselves well to performing arithmetic and logical functions. A partnership of neural networks, perhaps incorporating time-based analog neural nets, with conventional computers may be what we'll see develop over time.
Biological Nervous systems are not arbitrarily organized but are largely pre-wired.
"It is not clear whether a typical neuron can excite one neuron while inhibiting another", the way they can in artificial networks.
"Finally, it is unlikely whether the 'teacher' used in backward error propagation is a plausible model for learning. Indeed, the deeper into the system processing occurs, the more layers of subsystems are interposed between perceptual simulation and motor output—and hence the more difficult it would be to use the input and output of the overall system to adjust processing."
Artificial Neurons - Hybrid Temporal Processing Elements (HTPE's)
A small New Mexico State University spin-off called Intelligent Reasoning Systems, Inc., in Austin Texas, has developed what the company claims is the closest analog to an artificial neuron yet devised. It uses a hybrid analog/discrete representation which is said to more readily tackle such time-dependent tasks as speech recognition or motion detection without the complications of digital encoding. (The sampling of speech at regular intervals fails to encode the natural rhythms of speech, which continually varies its cadence.) The use of hybrid temporal processing elements (HTPE's) that are driven asynchronously by their inputs and that encode information temporally has shown that speech segmentation occurs naturally at a certain stage in the chain of signal processing. The developers, Mark Deyong and Randall Findley, state that silicon-based circuitry is much more reliable than biological neurons, which have are error-prone and have a high failure rate. In natural circuitry, there is no guarantee that a given neuron will fire, and nature makes up for that with a high degree of redundancy—unnecessary for silicon-based circuits. In addition the 1,000,000-to-1 speed advantage which silicon circuits enjoy over biological circuits can be used to further reduce the neuron count. The HTPE approach, because it more faithfully simulates the behavior of actual neurons, is said to be well suited to handling time varying, as well as spatially-varying patterns, in contrast to conventional neural networks which utilize static (though alterable) weighting functions. The behavior of, and programming for biological-neuron analogs (HTPE's) is fundamentally different, leading to networks which are very sparse compared to those of conventional neural networks. The HTPE approach requires 7 FET's (Field Effect Transistors) per neuron and 5 FET's per synapse.
Problems with Using HTPEs to Emulate the Human Brain
The problem that I foresee in trying to build an analog of the human brain using HTPE's or any other collection of neural networks in the near future lies in the number of synaptic weighting factors that would have to be represented. Even if HTPE's can remove the redundancies of natural neural networks and the 1,000,000-to-1 speed advantage of silicon circuitry could lower the neuron count to 10,000 or 100,000, the synaptic transistor count (of the order of 100 trillion?) would seem to be be far beyond present-day technology. Of course, in the future, given chip counts with trillions of transistor-like processing elements, such networks might be feasible. It is even possible that supercomputer-class systems containing 100 trillion transistors might be constructed in the year 2000-2005 time frame. (The planned Cray T3E would house up to 4 trillion transistors.) Such an investment might be well worth its cost as a proof-of-principle demonstrator.
Several other organizations are also attempting to create time-based, analog neuronal devices.
Conventional Computer Capabilities:
For all practical purposes, digital electronic computers are 100% reliable. (The unreliability of biological computers may be an unavoidable consequence of their molecular level miniaturization. Their circuit elements are so small that quantum-mechanical fluctuations may cause unpredictability as an outgrowth of the Heisenberg Uncertainty Principle. Silicon-based systems may be subject to unreliabilities and to a need for redundancy when circuit design rules decline below about 1,000 Å.) Silicon-based computer technology has focused upon fast uniprocessors rather than upon multiple processor systems in part because fast silicon-based uniprocessors have been technically feasible. Nature has no choice, but we do. Thus, silicon-based computers lie at the other end of the architectural spectrum from biological computers. Neural networks for biological computers may be an inevitable implication of the almost total parallelism dictated by the slow speed of electrochemical signal propagation.
For these reasons, it may be possible to implement AI functions in von Neumann type computers using far fewer circuit elements than are required by the brain, by processing serially what the brain would have to process in parallel. Only now are microprocessors appearing that offer the promise of reaching speeds and storage capacities at the lower end of this envelope. Intel is funded to develop a 1.8 terops (1.8 X 1012 operations per second) supercomputer using 9,000 P6 chips, and has announced plans to develop a 10 terops (1013 operations per second) parallel processor by the end of this decade. Cray Computer Corporation is developing the T3E parallel processor which, using 2,048 DEC processors, should yield up to 1.2 terops, with up to 4 terabytes of RAM. The new $550 Texas Instruments TMS80C digital signal processor (DSP) can deliver up to 2 gigops. The Sega Genesis Saturn game set utilizes video chips that operate at almost 1 gigops. With 9 gigabyte disk drives available for a retail price of about $2,000 each, a terabyte of on-line disk storage would be achievable today for less than $200,000. With disk drive capacities 20-folding every 10 years, a $2,000 1-terabyte disk drive ought to be available on desktop PCs (or in robots) by the year 2010, or perhaps a little earlier, together with 1 terops of 16-bit integer processing speed and 32 gigabytes of RAM. Alternatively, $200,000 should buy 100 terabytes of disk storage 15 years hence (in 2010). Another $200,000 might purchase 100 terops of 16-bit integer processing power and for an additional $200,000, 3.2 terabytes of RAM. In a way, it may be appropriate to compare the number of neurons in the brain with the number of calculations per second which can be performed be a computer, since at least one neuron in the brain must be dedicated for every operation that is to be performed in parallel. In that case, given the 1,000,000-fold speed advantage of silicon-based computers, coupled with their higher reliability, one could imagine that 10,000 parallel microprocessors might afford the same order of computational speed as the brain. This is approximately the number (9,000) of P6 chips that Intel has contracted to incorporate in their 1.8 terops Touchstone supercomputer which they are to deliver to DARPA next year or the 10 terops parallel processor that is projected for the year 2000. By the year 2010, a 1 teraflops computer capable of multi-terops integer processing might be available for $10,000 to $20,000, with supercomputers possibly approaching the petops range.
We shouldn't be surprised if this were to come to pass. Human substitutes for the physical capabilities of the animal kingdom, while still unspeakably unsophisticated compared to biological systems, have given us the supersonic transport, the diesel locomotive, and a great array of ultra-fast, ultra-powerful, ultra-precise machines which far outperform their animal counterparts. It would not be too startling if this were eventually to occur with the brain.
Use of the Developing Technology for Speech, Facial, and Handwriting Recognition, Etc.
As such uniquely human capabilities as
• speech recognition,
• handwriting recognition,
• optical character recognition
• recognition of faces,
• pattern (image) recognition,
• natural language processing, and
• voice synthesis
become part and parcel of the workaday computer world, one can foresee a time in 10 to 15 years, as digital signal processing speeds increase by a factor of at least 100, when conventional computers begin to surpass human-level competence in these areas, using conventional programming techniques and near-term computer hardware. (We may utilize a blend of neural networks, fuzzy logic chips, and conventional microprocessors to achieve optimum results at minimum cost.. whatever works best.) Eventually, we'll probably surpass human capabilities in these departments, at least in some, if not in all respects.
What Can Be Done on a 1996 Personal Computer?
It may be blatant hubris to contemplate the implementation of a significant subset of human-caliber intelligence on a desktop computer. However, I'm not altogether convinced that it will require the kinds of speed and storage capacity that are attributed to the brain to approach human levels of intellectual capability in various relevant areas. Progress is being made too rapidly in developing the uniquely human functions described above. Given another 100-fold (10-year) improvement in desktop computer speeds and storage, and another 10 years of software development, technology should at least be within shouting distance of the kinds of speeds and storage capacities needed to approximate some higher-level human mental functions. It isn't necessary that we exactly emulate the human brain. There may be other ways to achieve similar goals using conventional computers. For one thing, our robot need not be concerned about survival in the wild, and the evasion of predators. If we can achieve even a tiny fraction of what the brain can accomplish, we might still have produced a very-useful commercial product. If it turns out that we need a supercomputer to simulate the human brain in real-time, then it might be possible to develop human-caliber AI hardware and software on computers that run it at glacial speeds, followed by a brief test or tests on a supercomputer. Ten years from now, supercomputers should be operating in the 10's-of-terops range, with a 1 terops DSP cluster available for, perhaps, <$10,000.
The advantages to personal computers are not difficult to define. They are available to everyone, and are available for hobby work in most homes.
In any case, let's see how far we can go. In the process, we may learn a lot about functional requirements for the brain.
As has been mentioned above, the major challenge in developing a human surrogate is probably not that it be able to reason or even to recognize speech, but that it be self-aware and self-organizing. In this vein, it is imperative that our computer be motivated by emotions, feelings, and drives, and that it be guided by general principles rather than by specific commands. It should respond to and choose among competing thoughts and motivations, exhibiting the semblance of free will. Otherwise, we are in the position of having to program every little detail as we would any other computer, thereby losing flexibility and sentience.
Its behavior should be:
(1) no more predictable than a person's but
(2) not any more random than is a person's behavior (i.e., purposeful).
Emotions, Drives, and Self-Awareness
Can we create a self-aware computer motivated by drives and emotions? That's a very good question. This section discusses the ideas I've had for accomplishing this. I would welcome anyone else's suggestions concerning how these mechanisms might be programmed.
So far, this section is restricted to ideas which bear upon conceptual design rather than specific algorithms. However, program design, coding, and testing should be feasible given a little more time.
Among possible mechanisms for emotion that I've been able to imagine might be the ability to raise and lower computer clock speeds, physical power (strength) levels, and response rates. The robotic system has to hold some resources in reserve for emergency situations.
To slow the computer's clock, since computers are not designed with variable clock frequencies, we could steal cycles with software. We might normally run the computer's clock at 67% of maximum by stealing every third cycle, and then raising the clock rate to perhaps 80% when circumstances warrant.
To adjust physical power levels, we might lower and raise the power levels that operate the robot's servo-motor "muscles" by using the Novak electronic speed controls that are used for RC models. We could also adjust the feedback voltages coming from the weight and inertial strain gages and digital shaft encoders in its joints to make the robot feel relatively heavier or lighter.
Examples of emotional states in which physical power level would probably be appropriately reduced are relaxation, depression, hunger (for the recharging of batteries), and tiredness (It might be profitable if the robot were to need to sleep in order to rearrange its data base and to process information that couldn't be handled during the busy day). Examples of elevated-power and clock states would be elation, arousal (through fear, anger, or passion), and excitement/enthusiasm/absorption.
Generation of Ego
The ego (control unit) will balance competing urges/drives, and make choices, and should accumulate different response patterns (personality) depending upon its experiences .The ego or controller will evaluate situations and will try to predict the effects of alternate courses of action upon its welfare. It will also have a model of the self. Its self-image will contain a capabilities catalog of what the robot can and cannot do (or has been able to do and has not been able to do—i.e., a record of its successes and failures). This would actually be implemented using the file of abbreviated animation tracks (discussed below) of prior experiences. The controller will constantly evaluate its performance and will feel good when it decides that it has done well and bad when it concludes that it has done poorly.—that is, it will develop a critical parent—a subprogram that integrates prior experience and brings it to the forefront of consciousness. Whenever it succeeds in doing something new or challenging, it will reward itself with good feelings. This will take the form of improving the robot's self-approval index, based upon this most recent entry into its capabilities catalog, and creating a temporary, small-to-medium increase in voltages, clock speeds, with a reduction in gravidity and internal critique. Thus, the capabilities catalog will represent a cumulative factual record of the robot's demonstrated abilities, although details will be forgotten and perhaps repressed over time. The self-approval index will vary fairly rapidly, declining in a matter of hours to days as new events overlay the old. For example, gaining an insight will improve the robot's opinion of itself. The parent part will be an agent which draws upon the capabilities catalog and prior experience as a built-in advisor to guide the robot's actions. The robot might be motivated to work in ways that improve some internal box score, providing feedback at an abstract level. In other words, improving its self-image box score may be an important motivator for the robot.
(The tensions between the parent parts, the ego, the child parts, and all the drives, emotions, and evaluation processes are a crucial part of what makes us human.) The efficacy of the advice provided by the parent part will also be evaluated over time.
The parental part would probably first be characterized by tallying and making available statistics regarding prior outcomes. Anticipations of unpleasant outcomes generated by abbreviated reruns of prior, painful experiences could come from the "parental part" program subsection.
From a programming standpoint, how the robot responds will depend upon choices that reflect all of the various influences which emanate from different parts (subprograms) within its psyche.
From the standpoint of actual programming, there will indeed be a multitude of "agents" which will act upon the ego.
There will be competing urges (drives).
Desire to master (power over the environment)
Desire to imitate. (This may need to be an instinctive, built-in drive.)
Desire to please. Desire for attention.
Hunger for energy, when the batteries run low.
Desire for repairs when needed.
Desire for pleasure.
Desire for accomplishment, improved self-image.
Silent inner dialogue (in English, of course). Ability to simulate. An inner dialogue implies the ability to couch inner activities in natural language—a non-trivial capability (natural language processing) but one that has already received a considerable amount of attention. The challenge for us is that we are trying to create a robot that will understand language and the world in terms of direct experience instead of simply as a rule-based manipulation of symbols by a facile but mindless machine.
Rewards - The Pleasure Index
We can "reward" our robot by raising its pleasure index when it moves.
Pleasure for our computer might take the form of
(1) Elevating the clock speed
(2) Feeling light and energetic, as described under "Physiological Variables"
(3) If we decide to include "muscular" tension, in our program, then a feeling of reduced muscular tension.
(4) An encouraging, optimistic state of mind. An elevated capabilities index.
Would have short-term and long-term pleasure indices that would vary with time. The pre-verbal robot would be working adaptively, adapting its behavior to maximize its short-term and later, its long-term, pleasure indices.
?How do we simulate pain? There are many ways to punish the robot, including intermittent failure of subsystems or confusion at critical times. However, physical pain is still a mystery to me. I'm thinking that pain must be electrical because we feel pain instantly. If it were chemically transmitted, a few seconds would be required before the chemical messengers reached the brain. Granted that pain messages may release neurotransmitters in the brain, but how do they make a neural network hurt?
?For that matter, how do we make the robot care what happens to it?
You might ask why I think that it would be desirable to program emotions into robots, assuming that it's even do-able. My rationale is that if I'm trying to create something with human characteristics, I want to hew as closely to human thinking and feeling as possible, even if this strategy is to be abandoned when things move beyond the conceptual stage. Somehow, the robot must operate without explicit cookbook programming. Emotions exert general influences and generate motivations without dictating any specific actions.
However, it also seems necessary to devise motivators in order to get the robot to do anything besides sit and wait to be told what to do.
Also, I would want our robots to feel love, gratitude, personal loyalty, and other positive feelings rather than simply being heartless machines—saints rather than unfeeling monsters.
Emotions as States:
It seems to me that emotions are states that modify our responses and thoughts. Being in a given state will increase the probability of responding in ways appropriate to that state and thinking thoughts apposite to that state. (The robot's subconscious may bring up warnings about potential hazards, particularly after its "brain" has been reorganizing information while it sleeps.)
All of the emotions will exist simultaneously and will be elevated when circumstances warrant. They will be inducements to action.
Our robot's self-appraisal index would be temporarily raised, together with its capabilities index. Its short-term pleasure index would rise. Our robot would feel comfort (a temporary lowering of the activity itch) and mild euphoria when held, and to a lesser extent, when its parent were in the room. Thoughts and feelings would also be affected by the values of the short-term and long-term pleasure indices. (So far, we have four indices.)
Repression: Must suppress short-term. Must repress long-term in order to retain harmful feelings for future disposition and at the same time, must keep the harmful feelings from interfering with current affairs. Stamp collecting. Gradual fading over time.
It has been easier for me to imagine how to implement anger than it was other feelings.
Anger. Fight or flight. Dealing with thresholds. Lashing out at everything. Deliberately trying to destroy. Making noise. Need violence with anger. Will adopt an assertive or aggressive mind-set and be relatively apt to indulge in aggressive or assertive behavior. Will go into overdrive. Clock rates and "energy levels" will rise. Might deliberately break things and then face punishment or regret over the loss of what it broke. Activate when frustrated or attacked. Must decide whether to flee or to be angry. Desire to dominate, impose will upon the world. Amygdala. Moods must attenuate, must shift. May need to wait until some understanding is gained before implementing. Must elevate state of anger in the priority stack. Must lower the trigger-point thresholds for violent or angry responses when the anger level is raised, although the controller will have the power to override (suppress) this angry feeling. Can store up angry feelings for later disposition through repression. Angry actions will be more probable. Determination and "adrenalin" will go with anger.
The Expression of Anger
?But how does the robot know which actions are angry actions? ?How does it associate anger with angry actions? How does a tiny child learn to hit? Is it imitating its parents? Is hitting an instinctive behavior? Do we make physically lashing out an instinct with our robot?
Fear. Associated with flight. Associated with presumption of anger by others—i.e., feeling threatened. Withdrawal, self-protection, anticipation. Difficulty of separating imagination from reality.
Perception of danger. Fear will cause apprehensive behavior, fight or flight, cowering. Fear might trigger withdrawal or it might kead to confrontation/aggression, depending upon whether the robot adjudges itself able to dominate the situation. Will raise voltages, clock rates, but not the pleasure index. Will increase attention to the surroundings. Now what would produce fear? Startlement. Startlement will immediately raise the apprehension level. This sudden rise in the "fear' level will be interpreted as unpleasant. Fear will dictate the focusing of attention on sensory input, crowding out private thoughts. Unless suppressed by the ego, the robot will reflexively look toward the center of the disturbance, and instinctively look around. Its alerted state will be temporary, typically decaying over a 10-15 minute time frame. The threshold for reverting to normal behavior will depend upon several other indices. So we'll have an alerted state, and an alerted state index.
Thoughts of danger and peril and a tendency to interpret stimuli as dangerous would accompany this alerted state.
Alerted state in the animal kingdom: Adrenalin. Pulse rate increases. Metabolic rate increases. Attention focused on the surroundings. Senses ready to trigger.
It needs to jump when something startles it. It needs to experience an undesirable (agitated) state of arousal.
Exaltation: Relaxation. Increased clock rate. Temporary lowering of self-critical (guilt) feeling, "voices"; elevated level of self-acceptance. Must have an internal model of "self". Temporary freedom from conflict, contention. Actions: smiling. Optimism. Energetic. Pockets of thought, feeling, and assessment held at bay until either the barriers are lowered due to a mood swing, or an event triggers a dump. Elevated clock rate. Kinesthetic joint sensors express light feeling. Elevated skeletal muscle voltages.
Love: Dependency. Altruism. Gratitude. Nurturing. Desire to meld. Feeling of need, interdependency. Loneliness.
Will need moral and ethical standards (principles) of conduct.
Jealousy. Envy. (Tied in with its self-appraisal.)
?Can We Really Make a Computer Feel Anything?
?We may be able to influence the robot's behavior but how do we know that it is really feeling anything?
?What happens if the robot doesn't care whether bad things happen to it or not? How do we give it an instinct for self-preservation? Suppose that it experiences various emotional states and makes decisions and so forth but doesn't actually feel anything—doesn't feel pain and doesn't care whether it lives or dies.
Answer: That may happen. If so, we will still have learned a lot and have made a lot of progress. However, we could also be neutral about such matters but we're not. We believe that life is real and life is earnest. Maybe the robot won't be neutral, either.
The robot will feel a compulsion to move, explore, and investigate (constantly agitated.).We would like to make the robot more and more uncomfortable sitting still. It should develop an itch to move (an elevated activity index). We would like our baby robot to feel curiosity. We would like our robot to feel enthused about exploring its environment. We would like it to be intensely preoccupied with what it is doing. When it is in the presence of its "parent", it might feel a sense of peace and be able to sit still for a little while when it is cuddled.
So how might we implement this?
?How is the concept of influencing outcomes learned?
?How is the robot to translate feelings of avoidance into avoidant behavior?
Feeling of openness versus enclosure.
Will seek pleasure, avoid pain. Will balance against higher-order benefits and ideals (deferred gratification).
Hunger, sleepiness, frustration, anger, loss, sorrow, fear.
Senses: Kinesthetic (include gyro?), auditory (microphones), tactile (capacitance or touch-pad), thermal (thermocouples), visual (video camera), pain (internal damage), strain gauges (overload). Can't provide the chemically-based senses of taste and smell. Thirst (for water, oil). Might use water-based emulsive lubricant. Might some day extract water, protein, carbohydrate, and oil from food.
Sound generation (synthesis), speech synthesis.
The robot will sleep. It will get tired, it will get sleepy, and it will feel a stronger and stronger tug-and-pull to sleep. During sleep, the computer will readjust all the associative weights and will gradually prune details from its stored memories. If there is time, it may run simulations of problems encountered during the day's events or work on solving problems that the robot's conscious mind has addressed during the day. It may bring to the forefront of consciousness the backlog of repressed concerns and tasks that have been ignored during the day.
The robot's tiredness will take the form of a feeling of heaviness (by altering feedback from the optical encoders in the joints).
Sleep - Will feel heavier. Clock speeds will drop. Voltages will drop. Will tend to be weaker. Will be able to relax. Will think about sleep. Will take greater mental effort to keep going. Will sleep long enough to recharge batteries, process the data bank. May run simulations to improve motor control, digest information, lower the association weightings, selectively forget details, etc., while asleep.
Fatigue and Sleepiness
The robot can feel heavier and move slower when it is getting "tired". Greater effort may be required to move and to operate. Feelings won't stop the robot but will tug at its "psyche". Its clock speed might be made variable as a punishment/reward mechanism. Also, its ability to concentrate might be compromised by its feelings.
The robot will feel hunger as its batteries get low. The lower the batteries, the greater the hunger for a recharge (or a swapping of batteries). (While its batteries are unplugged, could carry it on a backup battery that is recharged by the fresh batteries after they are plugged in.) The robot will not have to respond to its hunger instantly, and this will be one of the nagging feelings that the ego will balance against other priorities. If the robot is in the middle of something important or something it wants to finish, it may choose to put up with the discomfort of energy hunger until it has finished what it is doing.
?Now how do we program all this?
The capabilities record might take the form of a record of the "success scores" the robot assigned itself for the most recent experiences in which the robot tried to do whatever it was that is relevant to current demands.
The role of play (let's pretend). Strong appetites. Tendency to imitate. Imprinting.
Kinesthetic - Locations of limbs, torques at joints. Gyro?
Tactile - Pressure, roughness (touch), thermal, pleasure. Internal status sensors.
Pain - Overloads in the touch, thermal, torque, internal, and location areas.
Need internal status sensors to warn of dry bearings, low batteries, high power levels, water in critical regions, etc.
Problem: How do we program a robot to explore the world and generalize from it?
• Curiosity, desire to explore
• Short attention span
• Striving for relationships, understanding
• Sequences as relationships
• Awareness of self, not-self
• Response to "no"
• Ability to extrapolate, simulate a sequence, given repeated reinforcement
In the beginning, the robot will have its scorecard indices set to zero. It will be unable to sit still. It will be rewarded for learning and it will embrace behaviors that maximize its rewards.
Timidity versus Confidence. When the robot gets confirmation of a prediction, its confidence will (temporarily) rise, as measured by its capabilities index ad its self-assessment index.
Extraneous thoughts will be partially inhibited while the robot concentrates on the mission at hand.
Adaptation of Strategies:
The robot's behavior will be influenced by what it experiences.
Level of tension.
Depression versus Euphoria
Shyness, Self--Consciousness vs. Self-Confidence
Hurried vs Laid Back:
A growing desire for certain things that will be appreciated when they finally become available.
Rewards for nurturing, kindness, generosity, altruism, gratitude. The initial programming will subject to modification by experience.
Too Hot or Too Cold:
The robot will reflexively pull back as fast as its servos will permit from anything that is very hot or very cold. It will withdraw less rapidly from entities which are less extreme.
Looking into the Sun:
Obviously, our robot must find looking into a bright light a painful experience.
?Does our exceptional capability to pull patterns out of noise stem from our required ability to detect predators camouflaged by a leafy screen?
Avoidance of Damaging Voltages and Currents:
We must provide circuit protection from currents and static discharges that could damage the robot's circuitry.
After a few (e.g., two or three) consistent experiences, the robot will correlate what happens consistently in a scenario. ?(Question: how do we identify the relevant details in a scenario?) Then it will actively attempt to predict what will happen when it deliberately repeats the scenario by testing its prediction to see if all, or if not all, then which subset of the correlations remain consistent. It might test the boundaries of the scenario. All this would be done by "instinct"—that is, would be preprogrammed.
?How does the computer recognize a repetitive experience? The robot reaches for the wall, touches it, and registers an event, when it makes contact (because something other than continuous motion, as well as unexpected, has happened).
Simplifying the Robot's First Exposure to the World:
The first thing the robot might be expected to examine would be itself.
?How does the robot comprehend self?
The robot will feel pleasure in gaining mastery of its motility. Could put a busy box where the robot could learn to operate it.
Suppose we start with a situation in which the robot faces a blank wall at some known distance. The robot is programmed to reach out and touch the hard, impenetrable wall. Specifically, we might program the robot to move its manipulator(s) somewhat randomly, as it explores itself and learns to associate its actual movements with the ways it wants to move—i.e., learns to control its manipulator(s).
The robot will retain an approximate record of its motions and can repeat them. The more it tries, the more precise its motions will become. It might try to
(Procedural memory.) After several such experiences, the robot will anticipate an unyielding wall before it reaches out.
?Problem: Why should the robot reach out and touch the wall?
Answer: Because the robot is programmed to feel pleasure/curiosity in exploration, pleasure in understanding, and an urge to move (restlessness). It will get bored and restless when it has nothing to do.
?Problem: Precisely how will the robot move its manipulator in order to touch the wall? This is a major issue with robotic manipulators today.
Answer: The robot will move its manipulator "straight" out and "straight" back. The motor profiles could be calculated approximately by solving the equations of motion, and then could be adaptively fine-tuned. However, in general, we might arrange matters so that the robot wouldn't remember its motor profiles quite exactly, so that it never makes exactly the same moves twice.
In reaching for something, the robot could first move one joint and then another, performing the motions simultaneously only after experimentation. This would be the expert system approach, with gradual increases in speed and complexity.
Inverting a motion matrix is one way of solving the differential equations governing motion. Another way to do it is with a library of interpolated cut-and-try solutions, using optimization/estimation techniques to converge to a solution. (Could run simulations before actually executing the motions.)
?Problem: How does the robot come to associate the wall with the resistance it feels when it touches the wall?
The robot should be able to visually detect the z-coordinate of its manipulator. Optical encoders may also provide this information, along with the tactile feedback of touching the wall. Then the fact that the coordinates are the same when this event takes place should lead to a time-based correlation, as well as a spatial correlation. After trying this for a few times with consistent results, the robot will associate touching the wall with the fact that it is being stopped at that spot. If we use an expert system for learning and discovery, the robot would experiment with the wall, feeling different spots and touching hard and softly. It would feel the wall at different locations (as opposed to merely touching it) to experience the feeling.
?Problem: Granted that the robot has touched and felt the wall, how does it externalize (objectify) the wall? All it knows is that when it goes through certain motions, a certain type of event occurs. How does that translate into the realization that there is a world out there? In other words, how does the android distinguish between self and not-self, and develop the concept of an objective, external world that exists independently of itself?
Tentative Answer: It can directly control itself, while it cannot directly control the world. It can feel what happens to it but not what happens to external objects. It must develop the higher-order abstraction that it can directly control part of the world (itself) but it cannot control the rest of the world. Therefore, the world is divided into two parts.
?How would it develop this concept? (This may be the first situation in which it develops a general concept.)
Answer: This might arise from experience. It requires that the robot set up a high-level subdivision which says that there are certain objects which it can directly control and many, many other objects which it cannot. (This would solve the problem of "objectivization"). This would be a behavioral classification. Once again, the subdivision would depend upon behavioral (operational) predictions regarding what happens with the two types of objects.
It will develop an anticipatory script that says that if it touches the wall, it will be stopped. The wall will be there irrespective of how the robot examines it. The wall will become a "law of nature" and will be understood by the robot in the sense that it knows in advance what will happen if it reaches for the wall. It will therefore lose interest in exploring the wall, since its ability to predict what will happen when it reaches out to touch the wall is tantamount to understanding the meaning of a wall. Of course, this still doesn't guarantee that it perceives the wall as an object rather than simply as a phenomenon. (At this distance, when the manipulator reaches this opacity, it will register resistance and will be stopped.) However, the anthropoid (or gynoid) will at least have the operational definition of a wall.
Now an object can be placed in front of the robot. The robot touches the object and it moves. This attracts the robot's attention. The robot begins to push the object around, like a kitten playing with a toy mouse. If we use our expert system approach, the robot will experiment with the object, feeling it (by grasping it), and moving it in varying directions and at varying speeds. It will lift the object and rotate it, feeling its weight. Meanwhile, its brain will be sorting and filing the information, learning about the object. Then if the object is replaced with a series of other objects, the robot's brain will identify the common elements among the objects, and the unique elements that distinguish different objects. Then it will generate an anticipatory scenario that anticipates a generic object, with the observed common elements and with a vague amalgam of the objects' unique features.
?How can the robot understand the difference between wall and not-wall?
Answer: Through distance measurements, coupled with obstructed versus unobstructed vision.
?How can the robot learn the concept of "solid"?
Answer: The robot will bang against walls a few times. It will remember sequences that capture the essence of approaching the wall. It will feel the wall.
Problems: Banging into the wall needs to have some penalty attached to it—otherwise, why not bang into the wall? For an organism, this might be the idea that it hurts a little to bang the wall, and that it detracts from clear passage.
Granted that there is value to not banging into the wall, how does the robot avoid it?
(1) The robot must establish a link between its desires and its actions.
Suppose that the robot moves toward a goal which attracts it and runs into an obstacle. The robot could just stop there, unable to solve the problem of encountering an obstacle. An infant in this situation would probably lose interest and wander off to a new goal. It might eventually circumnavigate the obstacle and
(2) Given a sequence that leads up to hitting a wall, the robot might record the sequence as a fairly simple system of actions and boil it down to a faster and faster-running anticipatory sequence which leads to striking the wall. Only given somewhat random behavior and/or, perhaps, some built-in idea of avoidance can the robot find a way to avoid walls.
Nouns are the names of objects; verbs are the names of action sequences.
After two or three sequences, the robot could anticipate a sequence from the initial cues. Might want to rapidly reduce the details of a low-trauma sequence and its likelihood of happening until it's reinforced. Might want next-day confirmation. However, when different sequences start the same way, then predictive confusion would result, together with a search for dissimilar cues. (The unexpected would happen, and the organism would feel frustration or disappointment or lowered self-confidence.) For example, "I'm going to hug you!" repeated several times would set up an expectation of hugging. Then "I'm going to touch you!" would trigger the expectation of hugging until the organism recognized differences in the cues. The robot must learn to sort out the wheat ("hug", "touch") from the chaff ("I'm going to...").
(The robot will correlate, and later, test its correlations, and differentiate among that which has similarities but also differences.)
In order to recognize an action sequence, the robot must first recognize that there is a repetitive sequence. If it's something that happens to the robot, it will have a higher priority than if it involves only external objects. But from the robot's perspective, doesn't everything that happens, happen to the robot?
Generalizing to larger and larger sequences and connections should result in understanding.
Recognition and labelling of action sequences (verbs) would seem to be a challenge. ?How does the robot distinguish between verbs and nouns?
Putting It to the Test: The Robot's First Exposure to the World:
?What should happen when we place a baby robot in the world for the first time?
Concept Formation, Abstraction:
After thinking about it, I'm thinking that many of the connections that we are so obvious that we take them utterly for granted must be part of extensive and intensive early learning on the part of an infant. For example, there is the concept of "solid", and the relationship between open space and our ability to move through it freely, versus objects through which we can't move freely.
The Concept of "Objects":
?Presented with a meaningless blur, how can the robot learn the concept of "objects"? One very important step in the robot's learning process would be "objectification"—learning that there is a world out there that is not under the robot's direct control.
The first thing the robot's brain will have to do will be to generate a 3-D map. It will do this by triangulating on the features that it has detected and has registered frame by frame as it moves around.
This "restlessness" will be pre-programmed.
What Attracts the Robot:
The robot will be attracted toward the most "interesting" features—features which move, features which make sounds, features which are brightly colored. This kinetropism, phonotropism, and chromotropism will be instinctive—that is, pre-programmed.
The infant robot will have no emotional defenses—no parent part, no ability to repress, and no real ego. Sensory inputs will be at full volume—uninhibited. At this stage, it will be driven by pre-programmed urges. It will be drawn to the first, most colorful feature it sees.
Everything will be recorded, set up in unique categories, cross-correlated (at first only on the basis of temporal proximity), and stored at full detail (see below for a discussion of "full detail").It may be that this instinct may have to be pre-programmed at least the first time we try it.
Here, we encounter the first disconnect.
Link Between Desire and Action:
?How does the computer establish a connection between desire and motor output? I am envisioning a platform with four-parallel-wheel steering so that it can move in any direction it pleases. One could make it flail around, like an infant, until it learns to coordinate its motor activity. However, motor activity itself must be associated with desire to move in certain ways. It will try to grasp the feature, if the feature is reachable. It must also s have learned to coordinate its manipulators so that it can reach and grasp objects before it can grasp anything.
• Developing a correlation between the robot's sensory experience and its motor output. The robot must be able to feel its motor activities as it sees its manipulator(s) move. This implies kinesthetic sensors in the joints—optical encoders and strain gages.
Also, the robot will be excited and experiencing what in a living organism would be pleasure. Its clock rate will be up, its self-appraisal will be positive (whatever we can make that mean), and it will be totally absorbed in its exploration.
Once the robot has examined a reachable, handle-able feature/object—felt its weight and its shape, squeezed it, banged it to see what noise it makes, heard its name, (if we want to give it names even at the outset) and examined it on all sides—then, for its first level of understanding about the world, it will consider the feature known and will lose interest in it, moving on to the next feature. It will store a tactile map and other concurrent sensory inputs such as sounds and kinesthetic inputs together with its visual imagery.
Short Attention Span
In the animal kingdom, for obvious reasons, a short attention span prevails among the young of all species so a short attention span will be pre-programmed into our baby robot. (It will grow out of this when it grows up.)
If the feature is not "handle-able", then the robot will examine the feature from all available directions and move on. If the feature changes, either while it is conducting its exploration or before it sees the feature/object again, then it will re-examine the object.
Associations, Causal Links
If the object changes during its first examination, it will tentatively associate the change with whatever else is going on at the same time. If the same sequence of events occurs repeatedly or simultaneously, then the robot's little mind will establish a causal link between these events and the change in the object for potential subsequent prediction of this change. The robotic mind will constantly be seeking associations between events and/or objects, and at the same time, it will be correcting or refining these correlations, given inconsistent sequences of events.
This will permit it to predict a change in the object, given the beginning of the associated sequence.
Decline in Interest
If the object is unchanging, then each successive time the robot encounters the object, it will spend less time examining the object than it has before. (Repetitive reinforcement is very important and comforting to infants and may have to do with wicked surprises in the world.)
The learning of general concepts and strategies must be an important part of the initial exploration of the world. Concepts like gravity must to be learned, and must present as unwelcome surprises. (Once a few objects have fallen, an anticipatory cause-effect pattern from remembered sequences should become established. "What I tell you three times is true." I do this, that happens; I do this, that happens.)
Abstracting the Concept of Gravity:
When the robot lets go of something, it will fall to the floor. The noise should startle the robot (and make an indelible impression). The robot should then interrupt its environment-exploration program and pick up the object because it did the unexpected. The robot would probably re-examine the object, see nothing unusual, and lose interest, dropping it again. Once again, it would pick up the object, examine it cursorily (shortening the sequence), and drop it again. The next (fourth) time, it should more or less eliminate the examination and should start picking up the object and dropping it until it becomes evident that it is the letting go of the object that triggers its fall. An anticipatory sequence would have been set up, with gradual or rapid elimination of the examination until only the essence of a cause-effect relationship would be left. If it were voiced, it might be "I pick up the object; I let go of it; it falls down and goes boom!" However, the first time the robot picked up the object, it picked the object up off the table, so picking it up off the floor is not a common element. The object won't fall until the robot releases it, so the releasing of it must trigger the fall. Not only that but the changing-of location-and-the-noise occurs just after the moment when the robot releases the object. So releasing the object becomes the only common denominator and therefore the probable cause of the change and the noise. (The robot's vision system may track the object by increasing the update rate to 30 frames a second and the resolution to either one or six minutes of arc the instant it detects that the object is rapidly changing location. (The object will move 0.21" then 0.85", then 1.92", then 3.41", then 5.3", then 7.7", then 10.45", then 13.65", and then 17.28", appearing 8 to 9 frames.) The robot would repeat the lifting and dropping until, after a few trials, it assumed the cause/effect relationship (in the form of a generic anticipatory sequence). When the robot has established the expectation that the object will fall when the robot releases it, the robot will lose interest in the object. Note the inferences which have been set up in the form of the anticipatory sequence after three or four experiments.
A lower-priority, unsolved puzzle would remain for the robot regarding why the object hadn't already fallen on the floor but instead, was sitting on the table when the robot picked it up. What happens next is negotiable. The robot could defer solving that puzzle and resume its exploration of its environment. Or it could follow up, by putting the object back on the table and observing that the object didn't fall. An adult could lift the object and put it back on the table and then the robot could try to imitate it. But how will the robot come up with imitation? It must first "objectify" its environment. One way to carry out imitation would be to be to put the robot through the motions, which it could then replay, as opposed to expecting it to try to imitate someone else. Another way might be through the parallel between the robot's recognizing its manipulator and recognizing someone else's. We might want to begin by using a manipulator that looked just like its own. It could lift the object off the table and drop it on the floor. It would do this several times until it established the consistency of the results. Then it could play with the object, dropping it from different heighths and noting the time it took to fall and the varying loudness with which it hit the floor. It could lift the object to varying heighths and drop it on the table. It could slide the object around on the table. It could slide the object over the edge of the table and discover that the object dropped without the action of being held and then released. In other words, it would begin to discover that falling is a behavior of unsupported objects and not of its releasing the object. One possibility might be to try to install an expert system for learning and discovery.
Returning to Its Exploration of the Environment:
Once the robot has abstracted its information and has tired of playing with the object (after three or four tries), it would go back to exploring the environment. It would probably pick up the next object, examining it, and then going through its little object-dropping ritual until it had established that this object also fell when unsupported. Meanwhile, the object-falling animation tracks would be stored with each test object as characteristics of that object.
Anticipatory Sequences Leading to Causal Relationships
After a few object-dropping tests, the robot would set up an anticipatory sequence in which it would expect all objects to fall when let go or rolled off the edge. With the passage of time, as the level of detail of each individual track were reduced, they would be consolidated into a single generic animation track, since they would be sufficiently similar to warrant this. As a data compression stratagem, this animation track would be referenced by all objects rather than storing a copy of it with each object. In the process, we would have moved from the specific to the general.
An Expert System for Learning and Discovery:
To define an expert system for learning and discovery, we will ask: "How would an intelligent adult explore a totally alien existence, given our well-trained analytical abilities?"
If we install an expert system for learning and discovery, we would probably want to make it modifiable by experience.
?How Do We Partition a Film Strip into Objects and Events without Human Intervention (Verbs and Nouns)?:
Whoa! What the robot is going to experience is an action sequence—a film clip. There is nothing in this to partition the world into actions and objects—verbs and nouns.
Answer: Following only our rules for associating, discriminating, forgetting, and condensing to generic classes, this partitioning should occur automatically because the object remains the same over a number of different experiential sequences. The animation tracks would fade into each other as the computer slowly reduced the level of detail night after night, combining the most similar animation tracks until one or only a few generic tracks were left.
?How do we remember when we've done something before?
?How do we convert what would otherwise be a featureless, continuous time track into a sequence of memorable events?
The principal animation track consists of the robot's location, direction of gaze, and whatever else is happening to it.
Given a location and a direction of gaze, the expected image is reconstructable. However, if anything changes, we remember the scene as it was before the change, together with the scene as it appeared after the change. Also, if there were any unique sounds or other happenings, we will remember the scene and the surrounding circumstances. For example, most people remember where they were and what they were doing when they heard that JFK had been shot.
Objects, sounds, smells, tastes, and above all, events, can trigger recollections of particular action sequences when we "went so-and-so and did such-and-such".
?How about this hypothesis? We remember the unexpected. We also attach weights to what we remember. If we glance at something or note its existence out of the corner of an eye, we don't tie it to the day's animation track (or we do so with such a low weight that it is soon forgotten). If in the process, we absorb a new level of detail, we may remember the detail without remembering when we saw it. However, if something unexpected happens, then we tend to associate the object with the action sequence and to attach a higher weight to the association, remembering it better.
We remember the animation track surrounding an emotionally charged event. I remember the night I spent in the hospital after my tonsillectomy. I remember the events surrounding the time I had laughing gas.
The learning of relationships is independent of remembering the animation tracks at the times when they were learned.
We have the ability to project trends. For example, if something is slowing down or speeding up, we will project a continuation of this trend. Of course, we have to be able to abstract the general concept of "slowing down" or "speeding up".
Brief generic animation tracks, like opening a drawer, pouring water, and all the other 1,001 common micro-moves we make each day, would be stored like attributes in a generic file.
• Must segment the continuous flow of time into events that can be recognized like objects. (No two events are identical, so some criteria for similarities must be established.) May use objects, sounds, other cues to trigger recollections of similar events.
• Need to generate anticipatory sequences or animation tracks.
Segmenting the Time Stream into Recognizable Events:
Need to distill action sequences down to highlight events to illuminate causal relationships.
Establishing the Concept of "Objects"
Meanwhile, the objects in the animation tracks would remain the same even when the animation tracks were sizably different. Consequently, the objects have a time-independent existence. Gradually, a model of external objects and of a world that has a time-independent reality would emerge. Or would it? Probably so. The shell model, including the objects, would end up as an invariant part of the generic action track or tracks which involved that room. If the objects and events within action sequences were stored independently of the sequences, then cross-linkages to other objects and events would develop that were based upon, for instance, the objects' similar silhouettes or other distinguishing features.
Hanging on to Objects
?Also, how will the robot learn to hang on to the objects it picks up?
Answer: Could be pre-programmed or could be learned by trial and error.
Impenetrability of objects is another generalization to be made. The robot has to learn to associate its being stopped with the object that is stopping it. Again, the process of generalization should take place until the robot associates impenetrability with all objects. (Imagine what a shock it's going to be when the robot first encounters a liquid!) The robot's state of confidence in its assumptions about the world is going to be sorely tried for a long time. Its gut-level self-evaluation is going to be impacted by these unexpected discoveries. As a part of its modeling of the world, such unwelcome surprises should cause the robot to become more cautious and skeptical for a while, and to test its assumptions more extensively than before. This caution will soon taper off, but will rise again after each unexpected challenge to familiar assumptions.
?Does this phenomenon explain a child's uncertain grip on reality? Eventually, by the time we grow up, we learn enough about the world that we aren't so often surprised in such fundamental ways.
When the Robot Experiences Water:
?What will our baby robot make of water? It has no specific shape and no specific color at all. Its properties will be utterly unlike those of the solid objects for which we have designed the shape tables and object identification mechanisms.
Also, how will the robot learn to use cup-shaped objects in lieu of actual cups, the way a human or a primate might improvise? How will it grasp the concept of providing a cup-shaped object to hold water?
Ancillary Lessons to be Learned:
The robot would learn other lessons from this experience. It should learn that it can make things happen.
It should learn the correlations among touch, sound and sight.
It should learn the concept of solidity.
Using anticipatory scripts, the robot should then test the situation by trying it again until it has confirmed and stored the phenomenon. It should try dropping new items. It will eventually experience breakage. It should then be disappointed that it can't restore the item and should be slow to drop new items.
The Robot Encountering an Obstacle
Problem: Baby hits an obstacle on its way to a goal and stops. Then what happens?
(1) The baby robot has been thwarted from reaching its goal. The robot should instinctively dislike being held captive and unable to reach its goal.
(2) The robot should have a short attention span.
(3) How hard the robot continues to try to reach its goal should depend upon the relative importance of its goal.
(4) The stratagem of moving in lateral directions could be programmed in. Or could train the robot by setting up oblique barriers. Could arrange on the front of the robot a mechanical steering arrangement that would turn the steering wheels parallel to a barrier.
(5) Once the robot has reached its objective by steering around a barrier, it could use a trial-and-error + computation algorithm to steer clear of the barrier, allowing adequate clearance.
The robot could first make redoubled efforts to reach its goal. The obstacle might prove to be moveable. If so, the robot will remember this strategy the next time it encounters an obstacle. If not, the robot could lose interest in its unattainable goal and could seek one of its alternate goals. Then when the robot starts to experience the same thing again, it could go through the sequence more rapidly. Next, it could anticipate its blockage and avoid the obstacle, going to the alternate goal, leaving sooner and progressing to the main goal. Next, it could skip the alternate goal and proceed directly to the main goal. It could also develop an anticipatory strategy for similar situations that could apply to other, more abstract types of obstacles. (Could be tied to the original situation through the common feeling—it feels frustrated by this situation just as it did with the physical obstacle, though it seems hard to extend the concept of circumnavigation to non-physical situations.)
(6) Or how about:
a. anticipating the barrier and wincing just before it hits the barrier the next time; (pain and frustration will be remembered a little better and earlier with each reinforcement.) On the other hand, it mustn't generalize immediately. How rapidly it generalizes will depend upon the trauma associated with the unpleasant event.
b. anticipating the barrier a little earlier the third time this happens; and taking evasive action when it realizes that the barrier is there. The evasive action would consist of avoiding the barrier—circling the barrier at a safe distance. .
c. eventually anticipating the barrier from the beginning and moving in a way which will avoid (circumnavigate) the barrier.
d. testing a time or two to see whether things have changed.
Abstracting the Property of "Obstructiveness"
As classes of objects are developed, a property such as obstructiveness would gradually be developed and obstructive objects in this class would be so recognized, along with the circumstances which under which they obstruct. There must be correlations, followed by testing and discriminations.
?How Do We Learn about Adjectives?
How will we remember adjectives? Develop concepts like "hardness", "softness", "light", "heavy"? Perhaps by anticipation.
Once a sound has become familiar, we become comfortable with it even if we don't know what it is (unless it's something we deem harmful or ominous).
With sounds, as with everything else, we abstract larger and larger patterns.
Recognizing timbre and unique voiceprint might be at the lowest level above speech recognition itself.
Recognizing accents and speech styles—e.g., whiny, bubbly, staccato—might be the next level up.
Recognizing someone's pet phrases and expressions entails a high level of verbal analysis.
The highest levels of speech presuppose a general knowledge of the world.
?How do we go about solving problems and inventing solutions?
For example, how does the robot grasp the idea of using a concave shape to hold water? It already knows the concept of gravity and that water will fall from prior experience. It also knows that as long as an object is supported by something, it won't fall. The robot can see that the water in a glass of water isn't falling. But how does the robot's little mind generalize to the idea that water must be cupped to keep it from falling down?
Idea: The robot might pick up the glass of water and move it around. Then since other glasses are interchangeable with the given glass, and since other things that are shaped like a glass may be included in the generic classification called "glasses", it might be that the robot would expect that water could be held up by anything that is classified as a glass. However, this doesn't really account for the mentation thath says "I've got a problem. How do I solve it?", and then proceeds to invent a solution.
We would like something more than a trial-and-error discovery that cup-shaped things hold water. We would like the realization that liquids must be held in containers, and then the insight that says, "Hey! If I use a cup-shaped container, it ought to hold water!"
The robot is building a world model.
Purpose enters in here. The idea of trying to create a tool
Concavity is not a vary obvious common property. But what's really in order is observing the property and behavior of water and then
The robot might play with the water. It might tip the water in the course of examining it and might observe that the water fell down. Then through repeated trials, it might observe that the water spilled out and fell down when it was tilted just beyond the edge of the container. It might shake the container and cause the water to be spilled out of it. It might—and here's where we get into invention—pour the water into another container and observe that it was no longer in the original container. (One of the lessons it would have to learn would be that after the water spilled out of the first container, it was no longer there.)
Before we deal with invention, we must learn verbs, adjectives, and adverbs.
?How would the robot learn its colors?
We would show it many different red objects while saying the word "red". The robot would have to determine that what all of the objects had in common was "redness".
The robot could be trained by guiding it in pointing to red objects and then letting it find and point to red objects on its own.
We wouldn't want to cross correlate all red objects with each other. This means that there must an attribute of redness that exists independently of any given object. Otherwise, we would have to cross-correlate "redness" among all the red objects. (In a way, we'll be doing that, in the sense that we'll have pointers from every red-colored object or feature to a "red" attribute stored only once for each remembered shade of red. To a certain extent, there may be pointers from the "red" attribute back to the red objects.) It follows that there will be entities other than unique objects and unique events in the database. Generic objects and generic events may also be stored like these attributes, with two-way pointers back to unique objects and unique events. Here, we may want to allow pointers back to all the objects and events themselves. After all, this would only double the number of required pointers. The pointers will have weights attached to them that will designate the strength of the association and that will gradually be reduced over time. We might want to use four bytes for the pointers to allow up to 4,294,967,296 table entries. (Three bytes would give us 16,777,216 entries in each table or file and would probably be sufficient.) "Red" might include a very approximate range of RGB values and the word "red" in text and spoken English. Or one might use pointers to the word "red" in the OCR file and the sound bit of the word "red" in the speech recognition file. With each shade of red, we will need to store the RGB values (or alternatively, the chrominance values) that define it.
Colors, like most other attributes, are human inventions. The color spectrum is continuous. There is no such thing as the color "red". "Red" is an arbitrary abstraction enforced by language. Furthermore, there are various subdivisions of "red" such as "carmine", "scarlet", "crimson", "brick-red" (whatever color that is), and so forth. And this is true in general, from colors through numbers to events. (Identifying colors will somewhat facilitated by the human propensity to print the primary colors rather than borderline colors (which can be handled with appellations such as "yellow-green"). There will be a hierarchy of colors
Identifying objects by an attribute such as color is tantamount to functional inversion. Given a function, find its inverse.
The robot must respond to "no!" and to scolding. the protean adaptability of the human mind.
Need to imitate humans.
What We Remember and What We Don't:
Note that unique experiences or events are remembered, like the midnight hike with Mr. Drew, or Ruth and I climbing the mountain at Estes Park. On the other hand, routine action sequences in the same setting are soon forgotten but the setting itself is well-remembered.
Sample of Storage Requirements: Wood grain finish, walnut. Width; depth; height; shape, with corner radius; assembling-it animation tracks; easily nicked; Sullivan Industries; slide-out drawers, dark back in drawers; memories of using it at 101 Lake Shore Blvd.; recollections of moving it out for cleaning, for access to cords; Christmas present from Tommie.
Strategy: Will remember the first time (Tommie helping me put it together), the unusual. As something is repeated, the strengths of the linkages and of the now-generic memories ought to increase but the animation tracks that led to it will not be stored. Links to memories of work done on the computer such as the house ads, the house floor plan, papers sent to ISD, etc.
Short-term memory and long-term memory.
Certain objects and events will be members of more than one class. For example, a cubical house will have pointers to, and to a lesser extent, from both a generic house and a cube. Ice cubes would also have a pointer to a cube and to ice and to cold and to ice trays and refrigerators and to the experiences of getting ice cubes out of the ice trays and of putting water into the ice trays (plus, probably, experiences featuring the spilling of the water on the way to the freezer).
The range of parameters from multiple instances of objects would probably be used to establish the range of parameters of the generic object. If something fell within that range or, perhaps, within a Guassian s or two of that range, it would be recognized as a member of that class. Otherwise, it would either fall into another class or would establish a new class. Actually, you'd probably want to use a recognition score or, perhaps, say that if an object satisfied one or a few definitive criteria, it would be included. Where there were ambiguity, closer inspection would be suggested (i.e., the recognition problem would be raised to the level of conscious awareness), or the object would be left unidentified.
The problem of categorizations: when do we stop categorizing a violin as a violin and begin classifying it as a viol or a guitar? How about sets and subsets? hen do we quit classifying a car as a car and begin calling it a truck or a forklift? All are vehicles. Shapes. Functional definitions. Might start with broad categories and later narrow down to finer discriminations.
Cross-linkages can be to other objects or to sequences, which can be labeled with a number in a look-up table. This would reduce the bit count for cross-references. The numbers might be assigned in chronological order, after checking to insure that each given item isn't already in the data base.
Could search in background mode. Could think (correlate and differentiate) in background mode.
A key problem is that of abstraction.
• Fuzzy recollections and modeling must be essential to recognition. That could be a reason why we don't remember most things at all exactly.
• Can remember at varying levels of abbreviation.
• Quasi-randomness would be essential to improvement. Motor skills require feedback, and variations in approach would allow evolutionary improvement.
• Non-quantitative. Note that visual recollections are very approximate. Abstraction is somehow visual and might be such a thing as "boltedness".
• Can be quantitatively emulated, although the brain probably doesn't do things quantitatively. This analog way of remembering may extend to all kinds of memory, including aural memory.
• Remembering invokes a dendritic structure of associated memories. Not remembering requires inhibition of these associated memories.
• A number of instances of a given object are stored.
• Abbreviated scripts. Everything is based on actions. Feelings are stored with objects.
• Can remember at varying levels of abbreviation.
• Faces, Must abstract at varying levels of resolution. Silhouettes are abstracted (can recognize from silhouettes). Can identify images in pictures.
• Problem-solving could take the form of trial-and-error and selecting a successful outcome.
• ?How do we generalize?
• The subject of abstraction is so crucial. We store such a small fraction of what we see and what we do store is so dependent upon our intent to store.
Memory and Recognition:
We store the exceptional, the unusual detail. But this makes it hard to generate a general-purpose taxonomy. On the other hand, if we store related examples, then the unusual details would establish the envelope.
Will certainly need to use model-based encoding with crude animation and, perhaps, rendering. Whether or not experiences are remembered is determined by what's going on inside and not directly by what's happening in the external world.
I choose not to remember the start-to-finish "video tape" of my visit to Nobie Stone. Instead, I extract excerpts from it at selected times when something special happened. There are "hot links" to Sunday-School, SSL in 1965, and other Nobie events. There are links between various events and Nobie's name, Nobie's face, Nobie's voice, and all the locales where I have encountered him. Nobie's voice is stored not as actual words but as a certain pitch and a style of diction, together with images of his face while speaking (seen from various viewpoints). The most vivid image is that of him speaking in Sunday-School class.
I can remember thoughts that I have had without necessarily remembering when I have had them. Last night, when John Stephens brought up a cooking anecdote, it triggered my cake-baking anecdote. I had to understand (abstract) the meaning of his conversation before I could make the connection.
Will certainly want to weight our recollections and relationships to recollections, perhaps on the basis of frequency, intensity (trauma), and perceived importance.
Will need to store action (animation) sequences. These may help establish cause and effect relationships (push this, and that happens). Understanding of relationships and sequences will be necessary. Action sequences will be particularly keyed to our own actions.
Certain activities such as locomotion and navigation should be handled subconsciously.
Storage: We will probably need at least a 40-bit address space (might get by with 32 bits for a while). Might use local directories for related material. Will probably want to continually prune and optimize. Could use 16-bit precision for absolute size.
Can recognize better with high precision.
Could use Gaussian error functions to recognize, but we're really interested in trigger points where flags are raised.
Might have a size factor, a point of origin, and 8-bit dimensions.
Might have a size factor associated with each dimension.
Might use a variable resolution size factor.
"Storing at Full Detail":
"Storing at full detail" needs some elaboration. The robot will be examining the object at a 5 frame per second update rate. If the robot could examine the object at maximum effectiveness, it could record about 56,500 pixels/second of 2° central vision detail or about 500,000 pixels of full-60°-field-of-view visual data. However, it would seem reasonable to permit the robot to store only a very limited degree of detail in a 1/30th second snapshot. To remember greater detail, more extensive study of the object would be required.(For recognition purposes, details must be stored at a level of detail which is hugely simpler than that of the photographic level.) Normally, we would store the representation of an object using a generic texture, coupled with exceptions from uniformity. With 20:1 wavelet compression, using only 256 colors, storage rates would be no greater than 100,000 bytes/minute or about 6 MB/hour. At that rate, a 2 GB disk would store 320 hours or 20 days ( 3 weeks) of observations. A 140 GB tape drive could handle about 1,500 days or four years. However, details could rapidly be degraded. They could fade rapidly at first and then more slowly later.
Training Robots in Virtual Environments:
Given a sufficiently realistic virtual environment within a computer, the robot might learn its way around by experiencing a simulated environment within a computer before it were presented with the real world. This would require a very realistic simulation of reality.
We might imagine a computer simulation in which the AI program learns to control its simulated manipulator. All the software that is needed carry out such a process could be defined and perhaps even created.
Memory Requirements for a Virtual Environment
Suppose a 400 sq. ft. room texture-mapped at 200 dots-per-inch. In addition to the floor area, there would be 80' of walls covered up to, perhaps, 5' for a total of 400 sq. ft. + the sides and surfaces of objects in the room for a total of, perhaps, 1000 sq. ft. or 144,000 sq. in. at 40,000 dots/sq. in. This would require about 6 GB if we stored 1 byte per pixel. However, if we assume a wavelet-based 10:1 image compression ratio, we might be able to store such scenery in 600 MB. The weight, center of gravity and moments of inertia, surface "feel", and other characteristics would have to be associated with each object. At that rate, we could store, perhaps, 1 sq. ft./MB. Then on a 9 GB drive, we could hold about 9,000 sq. ft. At a resolution of 32 dots-per-inch (1,000 dots/sq in.,150,000 dots/sq. ft.), we could store 600,000 sq. ft.,
?Why does a baby love repetition? Learning of motor skills? Concept formation?
• Current State of the Art
166 MHz Pentium, 200 MHz P6 available. High-density (4.7 gigabyte) CDs coming in late '96. 8 MB RAM, 500 MB disk, 2X CD ROM, 75 MHz Pentium at bottom ($1,200) end. 16 MB RAM, 1 GB disk, 4X CD ROM, 120 MHz Pentium for journeyman system.
We could currently afford approximately 9 GB of disk storage ($2,100), 225 SPECint92s of processor speed (a 150 MHz 604e or a 150 MHz P6, $3,000), and 64 MB of RAM ($2,100). Digital signal processors could up the ante to, perhaps, 2 Gigops of processing speed. A 140 GB Exabyte tape drive is available for $5,000.
Given a $60,000 grant, we might spring for 100 GB of disk storage (11 drives, $20,000), 10-20 Gigops of processing speed, and, perhaps, 0.5 GB of RAM.
Could use a 4-processor Daystar Gemini system. Or a 4-processor P6-based system. Or even two of them.
• December, '96, State of the Art
180 Mhz Pentium. 264 MHz P6? $15/MB RAM? 4.7 GB CDs, 2.4 MB/second? 15 GB hard drives? 6X CDs?
For $7,500, could afford a 180 MHz Pentium or, perhaps, a 264 MHz P6, 128 MB of RAM, 15 GB of disk, and 4.7 GB of CDs.
• December, '97 State of the Art
300 MHz P7? $8/MB RAM? 9.4 GB, 4 MB/second CDs? 9 GB hard drives?
• December, '98:
400 MHz P7, 133 MHz bus, 300 MHz P6, $4/MB RAM? 18 GB CDs? 30 GB hard drives?
• December, '99:
500 MHz P7, 166 MHz bus, $2/MB RAM? 18 GB CDs?, 30 GB hard drives?
• December, 2000:
600 MHz P8, 200 MHz bus; $1/MB RAM?, 36 GB CDs?, 90 GB hard drives?
• Year 2000 State of the Art:
CPU: 2,000 SPECint92s, 8,000 to 32,000 SPECs for native signal processing (NSP),
Disk: 90 GB
RAM: 1 GB
CPU: 16,000 SPECint92s (up to 256,000 SPECint92s in NSP mode)
Disk: 1 TB (11 drives),
RAM: 10 GB
CPU: 200,000 SPECint 92s (100-200 processors), up to 3.2 terops in NSP mode.
Disk: 5 TB
RAM: 100 GB
December, 2002 (actual):
2.4 GHz Athlon, 1 GB RAM, 120 GB hard drive, 4.7 GB DVD, 10 gigaflops
CPUs: 80 gigaflops
Disk: 2 TB
RAM: 8 GB
CPUs: 800 gigaflops
Disk: 20 TB
RAM; 80 gigabytes
CPUs: 6 teraflops
Disk: 150 TB
RAM: 500 gigabytes
• Year 2005 State of the Art:
CPU: 10,000 SPECint92s (25 Gigops)
Disk: 200 GB of disk,
RAM: 5 GB
CPU: 100,000 SPECs (250 Gigops)
Disk: 2 TB
RAM: 40 GB
CPU: 1 terops
Disk: 10 TB
RAM: 0.5 TB
This would approach human processing parameters.
• Ultimate (Conservative) State of the Art, as seen from 1995:
Assume 4 GB RAM chips. 10 GHz clock speeds. 10 GB/sq. in. disk densities.
Assume $100/GB, 2 Gigops processors ($20), 100 GB disks ($200). Then:
$6,000 would buy 1 TB of disk, 200 Gigops of CPU, and 20 GB of RAM.
$12,000: 2 TB of disk, 0.5 terops, and 40 GB of RAM .
This would correspond to about the year 2010 and leaves us down by a factor of 20 in speed. However, digital signal processors could conceivably boost speeds to 5 terops, or even 10 terops in volume production (25-50 chips @200 Gigops/chip).
$600,000 in 2010 should buy 100 TB of disk, 20 terops of processing power, and 2 terabytes of RAM. This should provide enough raw processing capability to permit proof of principle demonstrations of human-class thinking irrespective of what approaches are taken. This should afford computational resources in the general neighborhood of what the human brain can do, albeit at high expense and with large, hot machinery. Still, if a machine can be made as smart as a human, it can probably be made much smarter than a human in performing arithmetic and reasoning operations, and could be well worth the investment. Also,ways of cutting costs such as using high-volume custom chip sets and improving software algorithms could probably help to reduce costs.
Total Storage Requirements:
• During the course of a lifetime, we are awake about 6,000 hours/yr., or 540,000 hours in 90 years. At 2.25 GB/hour of compressed HDTV imagery, it would require about 1,750,000 GBytes of storage to accommodate a lifetime of visual memories, or about 1,750 terabytes. In actual practice, we probably store snapshots that can be animated in imaginative ways. (We wouldn't need to store the messages from both eyes once their 3-dimensional information has been digested.)
• If we stored information at a 92,000-pixel resolution instead of at a 2,000,000-pixel resolution, we would need 56 terabytes for 90 years. At this storage rate, 9 gigabytes of storage capacity would last about six 16-hour days.
• At the magnetic storage densities that IBM is currently targeting (10 Gb/in2.), 3.5-inch disks might store 25 GB/sheave and 5.25 in. disks could store, perhaps, 50 GB/sheave. With 5-sheave drives, this could translate into 125 and 250 gigabytes/drive. If we accept IBM's theoretical limit of 62.5 Gb./in2 as an upper bound on magnetic storage densities, then storage may never exceed 1 and 2 terabytes/drive for 3.5 and 5.25 inch drives, respectively. A 2-terabyte drive could store about 900 hours of HDTV.
• If we stored a frame in 10 KB, we could store 100,000 frames/GB. It would require 5.5 GB to support 1 frame/hour for ninety years. More aptly, we will be storing 3-D shell models of our mostly-familiar surroundings at very low resolutions with a minimum of remembered detail, together with animation sequences and many, many cross-references. Using model-based encoding, and degrading old memories to lower and lower resolutions, a few gigabytes might be sufficient (not that it would have to be). Most of what we experience is the same old same old, and needn't be stored without much of any information except pointers to a few generic scenes. The details of what we do each day are soon forgotten.
Dividing up the Task:
Might use 4 visual µprocessors, 2 for each eye. Might use additional µprocessors for 3-D imaging, clipping, texture mapping, Gourad shading, and Z-buffering.
What we will need:
A speech recognition package that can be embedded in the computer system. Ideally, would like the Speech Systems, Inc., Phonetic Engine 500.
A state of the art OCR package—either OCR Professional 6.0 or Accutext.
A state of the art voice synthesis package.
A facial recognition program.
A 3-D graphics program that can generate 2-D views.
Could use a VCR storing the reduced resolution image that the computer is generating. Could use a computer monitor presenting the 3-D model that is being constructed within the robot. Would be interested in the interrelationships and the abstractions that are developing within the computer.
 - By the same token, like our arithmetical and logical capabilities, our speech recognition and linguistic capabilities may be Johnny-Come-Latelies on the evolutionary scene and may be a lot easier to master than such "lower-level" capabilities as vision and walking.
 - One approach to this might be to provide an Internet publication forum in which different contributors can gain recognition for their contributions, including patent rights, where applicable. Such an activity would require that someone be an honest broker. It might also require special security features and online access to a patent library. If you would be interested in such a role or have ideas about how cooperation might be implemented, I would welcome hearing from you.
 - Hans, Moravec, "The Universal Robot", Analog Science Fiction adn Fact, Jan., 1992, p.93-101.
 - At Case Institute of Technology, my mathematician office-mate and I were fascinated with the Perceptron concept when we first heard about it in 1958. We speculated about how it might work, but then heard no more about it.
 - When I was working as a graduate student at Case Institute of Technology, our project mathematician had proven that it was impossible to apply two-terminal-pair analysis to networks of probabilistic switches. As we began to prepare our final report, two of our staff members began to try to apply two-terminal-pair analysis to probabilistic switch nets. I was resentful of their wasting their time on this wild goose chase after Dr. Lehman had already proven that it couldn't be done. In a few days, they came up with correction formulae that allowed two-terminal-pair transformations even though, technically-speaking, it couldn't be done. I learned a valuable lesson about keeping an open mind in the research game from that experience.
It's also noteworthy that, by 1895, America's leading astronomer, Dr. Simon Newcomb, had mathematically proven that heavier-than-air flight would forever be impossible. He publicly announced this in 1903, the year of the Wright Brothers' first flight at Kitty Hawk. In 1911, he announced that it would forever be impossible for an aircraft to carry a passenger. That happened to be the year of the first passenger flight. His timing wasn't very good.
I guess we all need a healthy disrespect for authority.
 - These numbers are far from a consensus. A 1995 book estimates the number of neurons at 50,000,000.
 - Dr. Moravec is responsible for the 1013 operations per second lower bound, Danny Hillis of Thinking Machines has authored the 1016 operations per second estimate, Dr. Terry Sejnowski of the Salk Institute has published an estimate of 1015 operations per second. The 1,000:1 range among these estimates may partially arise from differences in
 - Note that we're comparing apples and oranges. We may still be seriously underestimating what an individual neuron can do.
 - A 150 MHZ pentium chip is rated at 180 MIPS, while a 167 MHz P6 chip is projected to yield 240 MIPS. The P6 transaction bus is designed for efficient 4-chip operation, delivering 1,000 MIPS with four machines. A 133 MHz 604 chip is quoted at 200 MIPS. Extrapolating these parameters to the state-of-the-art speed of 300 MHz, these chips would provide processing speeds of about 450 MIPS. DEC's follow-on chip is projected to operate at 400 MHz, providing processing speeds in excess of 500 iSPECS. (DEC has pledged to 500-fold its microprocessor speeds over the next 20 years, and current forecasts call for single-microprocessor PC speeds of 3,000 to 15,000 SPECint92s by 2005, with multiple digital signal processors running at possibly 100 billion operations per second by that date.)
 - IBM is reputedly planning a 20-fold "bump-up" in disk drive densities, projecting a 90 gigabyte disk drive for PCs by the year 2000. This would up single-disk-drive capacities to about 200 GB within the next few years (2003?), and would lower tha cost of a terabyte to about $10,000. However, IBM has warned that magnetic domains smaller than about 0.1 µ begin to exhibit quantum leakage into other domains. This would correspond to a bit density of about 62.5 billion bits per sq. inch or about 0.2 terabytes per 3.5" disk sheave. One might hope for ultimate magnetic disk capacities of 1 terabyte for 3.5" disks or 2 terabytes per 5.25" drive—still well below the presumed storage demands of human intelligence. (We note that the Japanese have begun work on a one terabyte optical storage system, and a number of other schemes are being touted for multi-terabyte optical disks. For example, the resolutions advertised for IBM's scanning woud support 10 to 20 terabyte 3.5" disk drives.)
- There are several hidden assumptions here, such as the idea that the robot knows that it can cause things to happen, that it will not try to lift the object up through the table, and
 - An alternative way to handle the reverse reference would be to search the object- database for red objects.
 - Cross-cultural studies of color naming around the world suggests that color is not arbitrary but is perhaps a function of the promary colors that are registered by the human eye.
 - It is assumed that the stereo information from both eyes has been digested and that visual information is stored in a 3-D shell model format.
[RNS1]Donna Baker: 726-2737, 1945, 1939, 722-311; Fax: 726-2630
Angie Buckeley: 544-0054
Belser Dasarthy: 922-9230, ext. 355
TI C80X - 4 adv. DSP's on chip, 40 MHz, 5.4 w., 2 Gops, $579
Model-based video encoding; head pose estimation; Feature point tracking