Dividing up the Task:
    Might use 4 visual µprocessors, 2 for each eye. Might use additional µprocessors for 3-D imaging, clipping, texture-mapping, Gourad shading, and Z-buffering.
    What we will need:

  • A speech recognition package that can be embedded in the computer system.
  • A state of the art OCR package—either OCR Professional 10.0 or Accutext.
  • A state of the art voice synthesis package.
  • A facial recognition program.
  • A 3-D graphics program that can generate 2-D views.
  • Training Robots in Virtual Environments:
        Given a sufficiently realistic virtual environment within a computer, the robot might learn its way around by experiencing a simulated environment within a computer before it were presented with the real world. This would require a very realistic simulation of reality.

    Memory Requirements for a Virtual Environment
        Suppose a 400 sq. ft. room texture-mapped at 200 dots-per-inch. In addition to the floor area, there would be 80' of walls covered up to, perhaps, 5' for a total of 400 sq. ft. + the sides and surfaces of objects in the room for a total of, perhaps, 1000 sq. ft. or 144,000 sq. in. at 40,000 dots/sq. in. This would require about 6 GB if we stored 1 byte per pixel. However, if we assume a wavelet-based 10:1 image compression ratio, we might be able to store such scenery in 600 MB. The weight, center of gravity and moments of inertia, surface "feel", and other characteristics would have to be associated with each object. At that rate, we could store, perhaps, 1 sq. ft./MB. Then on a 9 GB drive, we could hold about 9,000 sq. ft. At a resolution of 32 dots-per-inch (1,000 dots/sq.in., 150,000 dots/sq. ft.), we could store 600,000 sq. ft.,

    Why does a baby love repetition? Learning of motor skills? Concept formation?

    Motion Control:
        In reaching for something, the robot could first move one joint and then another, performing the motions simultaneously only after experimentation.
        Inverting a motion matrix is one way of solving the differential equaions governing motion. Another way to do it is with a library of interpolated cut-and-try solutions, using optimization/estimation techniques to converge to a solution. (Could run simulations before actually executing the motions.)
    • Fuzzy recollections and modeling must be essential to recognition. That could be a reason why we don't remember most things at all exactly.
    • Can remember at varying levels of abbreviation.
    • Quasi-randomness would be essential to improvement. Motor skills require feedback, and variations in approach would allow evolutionary improvement.
    • Non-quantitative. Note that visual recollections are very approximate. Abstraction is somehow visual and might be such a thing as "boltedness".
    • Can be quantitatively emulated, although the brain probably doesn't do things quantitatively. This analog way of remembering may extend to all kinds of memory, including aural memory.
    • Remembering invokes a dendritic structure of associated memories. Not remembering requires inhibition of these associated memories.
    • A number of instances of a given object are stored.
    • Abbreviated scripts. Everything is based on actions. Feelings are stored with objects.
    • Can remember at varying levels of abbreviation.
    • Faces, Must abstract at varying levels of resolution. Silhouettes are abstracted (can recognize from silhouettes). Can identify images in pictures.
    • Problem-solving could take the form of trial-and-error and selecting a successful outcome.
    • How do we generalize?
    • The subject of abstraction is so crucial. We store such a small fraction of what we see and what we do store is so dependent upon our intent to store.
    • Drives: Pursuit of pleasure, avoidance of pain, but heavily influenced by self-discipline

        We store the exceptional, the unusual detail. But this makes it hard to generate a general-purpose taxonomy. On the other hand, if we store related examples, then the unusual details would establish the envelope.
       Will certainly need to use MPEG4 with crude animation and, perhaps, rendering.
        Need to imitate humans.
        Will certainly want to weight our recollections and relationships to recollections, perhaps on the basis of frequency, intensity (trauma), and perceived importance.
        Will need to store action (animation) sequences. These may help establish cause and effect relationships (push this, and that happens). Understanding of relationships and sequences will be necessary. Action sequences will be particularly keyed to our own actions.
        Certain activities such as locomotion and navigation should be handled subconsciously.

    Storage: We will probably need at least a 40-bit address space (might get by with 32 bits for a while). Might use local directories for related material. Will probably want to continually prune and optimize. Could use 16-bit precision for absolute size.
        Can recognize better with high precision.
        Could use Gaussian error functions to recognize, but we're really interested in trigger points where flags are raised.
        Might have a size factor, a point of origin, and 8-bit dimensions.
        Might have a size factor associated with each dimension.
        Might use a variable resolution size factor.
        How is the concept of influencing outcomes learned?
        How is the robot to translate feelings of avoidance into avoidant behavior?
        Feeling of openness versus enclosure.
        Will seek pleasure, avoid pain.Will balance against higher-order benefits and ideals (deferred gratification).