Problem: How do we program a robot to explore the world and generalize from it?
Feelings and Characteristics:
Curiosity, desire to explore
Short attention span
Striving for relationships, understanding
Sequences as relationships
Awareness of self, not-self\
Response to "no"
Ability to extrapolate, simulate a sequence, given repeated reinforcement
The Protean adaptability of the human mind
we remember when we've done something before?
How do we convert what would otherwise be a featureless, continuous time track into a sequence of memorable events?
The principal animation track consists of the robot's location, direction of gaze, and whatever else is happening to it.
Given a location and a direction of gaze, the expected image is reconstructable. However, if anything changes, we remember the scene as it was before the change, together with the scene as it appeared after the change. Also, if there were any unique sounds or other happenings, we will remember the scene and the surrounding circumstances. For example, most people remember where they were and what they were doing when they heard that JFK had been shot.
Objects, sounds, smells, tastes, and above all, events, can trigger recollections of particular action sequences when we "went so-and-so and did such-and-such".
How about this hypothesis? We remember the unexpected. We also attach weights to what we remember. If we glance at something or note its existence out of the corner of an eye, we don't tie it to the day's animation track (or we do so with such a low weight that it is soon forgotten). If in the process, we absorb a new level of detail, we may remember the detail without remembering when we saw it. However, if something unexpected happens, then we tend to associate the object with the action sequence and to attach a higher weight to the association, remembering it better.
We remember the animation track surrounding an emotionally-charged event. I remember the night I spent in the hospital after my tonsillectomy. I remember the events surrounding the time I had laughing gas.
The learning of relationships is independent of remembering the animation tracks at the times when they were learned.
We have the ability to project trends. For example, if something is slowing down or speeding up, we will project a continuation of this trend. Of course, we have to be able to abstract the general concept of "slowing down" or "speeding up".
Once a sound has become familiar, we become comfortable with it even if we don't know what it is (unless it's something we deem harmful or ominous).
With sounds, as with everything else, we abstract larger and larger patterns.
Recognizing timbre and unique voiceprint might be at the lowest level above speech recognition itself.
Recognizing accents and speech stylese.g., whiny, bubbly, staccatomight be the next level up.
Recognizing someone's pet phrases and expressions entails a high level of verbal analysis.
The highest levels of speech presuppose a general knowledge of the world.
How do we go about solving problems and inventing solutions?
For example, how does the robot grasp the idea of using a concave shape to hold water? It already knows the concept of gravity and that water will fall from prior experience. It also knows that as long as an object is supported by something, it won't fall. The robot can see that the water in a glass of water isn't falling. But how does the robot's little mind generalize to the idea that water must be cupped to keep it from falling down?
Idea: The robot
might pick up the glass of water and move it around. Then since
other glasses are interchangeable with the given glass, and since
other things that are shaped like a glass may be included in the
generic classification called "glasses", it might be
that the robot would expect that water could be held up by
anything that is classified as a glass. However, this doesn't
really account for the mentation which says "I've got a
problem. How do I solve it?", and then proceeds to invent a
We would like something more than a trial-and-error discovery that cup-shaped things hold water. We would like the realization that liquids must be held in containers, and then the insight that says, "Hey! If I use a cup-shaped container, it ought to hold water!"
The robot is building a world model.
Purpose enters in here. The idea of trying to create a tool
Concavity is not a vary obvious common property. But what's really in order is observing the property and behavior of water and then
The robot might play with the water. It might tip the water in the course of examining it and might observe that the water fell down. Then through repeated trials, it might observe that the water spilled out and fell down when it was tilted just beyond the edge of the container. It might shake the container and cause the water to be spilled out of it. It mightand here's where we get into inventionpour the water into another container and observe that it was no longer in the original container. (One of the lessons it would have to learn would be that after the water spilled out of the first container, it was no longer there.)
Before we deal with invention, we must learn verbs, adjectives, and adverbs.
How would the robot learn its colors?
We would show it many different red objects while saying the word "red". The robot would have to determine that what all of the objects had in common was "redness".
The robot could be trained by guiding it in pointing to red objects and then letting it find and point to red objects on its own.
We wouldn't want to cross-correlate all red objects with each other. This means that there must an attribute of redness which exists independently of any given object. Otherwise, we would have to cross-correlate "redness" among all the red objects. (In a way, we'll be doing that, in the sense that we'll have pointers from every red-colored object or feature to a "red" attribute stored only once for each remembered shade of red. To a certain extent, there may be pointers from the "red" attribute back to the red objects.) It follows that there will be entities other than unique objects and unique events in the database. Generic objects and generic events may also be stored like these attributes, with two-way pointers back to unique objects and unique events. Here, we may want to allow pointers back to all the objects and events themselves. After all, this would only double the number of required pointers. The pointers will have weights attached to them that will designate the strength of the association and that will gradually be reduced over time. We might want to use four bytes for the pointers to allow up to 4,294,967,296 table entries. (Three bytes would give us 16,777,216 entries in each table or file and would probably be sufficient.) "Red" might include a very approximate range of RGB values and the word "red" in text and spoken English. Or one might use pointers to the word "red" in the OCR file and the sound bit of the word "red" in the speech recognition file. With each shade of red, we will need to store the RGB values (or alternatively, the chrominance values) that define it.
Colors, like most other attributes, are human inventions. The color spectrum is continuous. There is no such thing as the color "red". "Red" is an arbitrary abstraction enforced by language. Furthermore, there are various subdivisions of "red" such as "carmine", "scarlet", "crimson", "brick-red" (whatever color that is), and so forth. And this is true in general, from colors through numbers to events. (Identifying colors will somewhat facilitated by the human propensity to print the primary colors rather than borderline colors (which can be handled with appellations such as "yellow-green"). There will be a hierarchy of colors
Identifying objects by an attribute such as color is tantamount to functional inversion. Given a function, find its inverse.
Brief generic animation tracks, like opening a drawer, pouring water, and all the other 1,001 common micro-moves we make each day, would be stored like attributes in a generic file.