The Robots Are Getting Closer - 6
1/8/2003

Home

Future Index

What Do We Expect An Intelligent Robot to Be Able to Do?  
    It's an important fact that many capabilities that we regard as uniquely human are already available on computers, with varying levels of success. This is being accomplished with computers that run at 1/100,000th of the alleged processing power of the human brain.
Computer vision
    Computer vision is high on the list, and this is where Dr. Moravec and company have passed a crucial milestone. With this, computers are able to build 3-D "shell models" of the world. he computational requirements for visual navigation would appear to lie in the ten-gigaflops-and-up range. Of course, visual recognition is another crucial capability that gets into the larger arena of a complete model of the world.  
Speech recognition
    Speech recognition would also be high on the list. 
    Scansoft's $695.95 Dragon NaturallySpeaking Professional Edition consumer dictation system works best with a 500 MHz Pentium III or better, 256 MB of RAM, and 300 MB of disk space. 
    IBM's $169.95 Pro USB Edition of ViaVoice requires a 600 MHz Pentium III or better, 192 MB of RAM, and 510 MB of disk space. (I find myself wondering how much better these systems could do with computing and storage capacities that were 1,000 times as great.) Another bellwether speech recognition system is AT&T's "Watson" system. AT&T claims, and probably rightly, that its system, under development for 30 years, is the most accurate. It's designed to run on large servers with greater resources (whatever those resources may be).
Speech synthesis
    Speech synthesis is speech recognition's consort. As is the case with "Watson", AT&T claims preeminence in this domain, although as 5 years ago, Apple Computer's "Victoria" gave the best speech synthesis performance I've heard other than our Weather Service narrator. The Weather Service has recently announced an upgrade to their computer-generated narration that, allegedly, will greatly improve it. One of the best examples of the state-of-the-art in speech synthesis is "Ananova", which includes Ananova's  talking head.
    By far the greater problem in speech synthesis is the understanding of the meanings conveyed by conversations.
Meaningful dialogue
    Meaningful dialogue may be the next step beyond speech synthesis . This is of great value to industry because of its potential to cut labor costs.
    The first step in this progression was the hated,  menu-driven, Touch-Tone telephone-answering systems that answer most telephones these days. 
    Next, was voice response systems that allow you to say your answers to the computer. These have been hampered by the crudity of telephone microphones and line quality, which makes it difficult for a human to understand what's being said, let alone a computer. 
    Customer interactive systems that will permit you, in a limited context, to tell the computer what you want might be the "next big thing".. AT&T offers this euphemistic example ("How May I Help You?") of how this will work. "How May I Help You?" also shows the Ananova-like talking heads that could interact with you via your computer, your TV, or your PDA. Right now, these systems will undoubtedly be limited to frequently asked questions. The computer will have no concept whatever regarding what these questions really mean. 
    Six years ago, in 1997, I published a timeline in which I depicted that computer-based "personal assistants" would begin answering questions by the 2005-2007 time frame. I was about to throw in the sponge on this prediction until I saw the above AT&T demonstration. But I think now that it might happen.
True understanding of our human world
    All computer-human interactions would be greatly enhanced if a computer could gain an understanding of our human world. This is a major, major undertaking. To really understand our human world, the computer/robot will have to feel pleasure and pain, to possess emotions, and to possess goals of its own (discussed in greater detail below).
Optical character, and handwriting recognition
    Optical character recognition has been around for at least four decades with the U. S. Postal Service, and has been teamed with desktop scanners for at least 15 years. Handwriting recognition was popularized with Apple's Newton. I don't have any feeling for just how well this can be done when cost is no object. How well can the Post Office read printed and hand-written addresses?
    A machine vision system must have a resolution of 30 arc-seconds to achieve 300-dot-per-inch resolution at 21 inches from the eye, or 400-dot-per-inch resolution at 16 inches from the eye. A robot will have to perform its functions under varying lighting conditions.
Facial recognition
    Facial recognition is a relatively recent addition to the robotic panoply of tricks. It works well enough to have considered using for surveillance in crowds.
    IBM has explored allowing the computer to respond to gestures as well as keyboard and mouse input, and Vanderbilt is experimenting with having the computer read facial expressions.

Most of these capabilities appear to be relatively insensitive to hardware improvements.
    Although computer performance has increased by a factor of ten  every 5 years, the performances of, e. g., speech recognition systems has increased only modestly. Apparently, these programs aren't hampered much by current hardware limitations. I would expect to see slow improvement in them over the years.

    These human-level capabilities probably function significantly differently than the human brain in that the brain is basically self-teaching.

Back                                  1   2   3   4   5   6   7   8   9   10   11   12                                   Next