The
Robots Are Getting Closer - 6
1/8/2003
| Home |
What Do We Expect An Intelligent Robot to Be
Able to Do?
It's an important fact that many capabilities that we regard
as uniquely human are already available on computers, with varying levels of
success. This is being accomplished with computers that run at 1/100,000th of
the alleged processing power of the human brain.
Computer vision
Computer vision is high on
the list, and this is where Dr. Moravec and company have passed a crucial
milestone. With this, computers are able to build 3-D "shell models"
of the world. he computational requirements for visual navigation would appear
to lie in the ten-gigaflops-and-up range. Of course, visual recognition is
another crucial capability that gets into the larger arena of a complete model
of the world.
Speech recognition
Speech recognition would also
be high on the list.
Scansoft's $695.95 Dragon NaturallySpeaking Professional Edition
consumer dictation system works best with a 500 MHz Pentium III or better, 256
MB of RAM, and 300 MB of disk space.
IBM's $169.95 Pro USB Edition
of ViaVoice requires a 600 MHz Pentium III or better, 192 MB of RAM, and 510 MB
of disk space. (I find myself wondering how much better these systems could do
with computing and storage capacities that were 1,000 times as great.) Another
bellwether speech recognition system is AT&T's "Watson"
system. AT&T claims, and probably rightly, that its system, under
development for 30 years, is the most accurate. It's designed to run on large
servers with greater resources (whatever those resources may be).
Speech synthesis
Speech synthesis is speech
recognition's consort. As is the case with "Watson", AT&T
claims preeminence in this domain, although as 5 years ago, Apple Computer's
"Victoria" gave the best speech synthesis performance I've heard other
than our Weather Service narrator. The Weather Service has recently announced an
upgrade to their computer-generated narration that, allegedly, will greatly
improve it. One of the best examples of the state-of-the-art in speech
synthesis is "Ananova",
which includes Ananova's talking head.
By far the greater problem in speech synthesis is the
understanding of the meanings conveyed by conversations.
Meaningful dialogue
Meaningful dialogue may be the
next step beyond speech synthesis . This is of great value to industry because
of its potential to cut labor costs.
The first step in this progression was the hated,
menu-driven, Touch-Tone telephone-answering systems that answer most telephones
these days.
Next, was voice response systems that allow you to say your
answers to the computer. These have been hampered by the crudity of telephone
microphones and line quality, which makes it difficult for a human to understand
what's being said, let alone a computer.
Customer interactive systems
that will permit you, in a limited context, to tell the computer what you want
might be the "next big thing".. AT&T offers this euphemistic
example ("How May I Help
You?") of how this will work. "How May I Help
You?" also shows the Ananova-like talking heads that could interact
with you via your computer, your TV, or your PDA. Right now, these systems will
undoubtedly be limited to frequently asked questions. The computer will have no
concept whatever regarding what these questions really mean.
Six years ago, in 1997, I published a timeline
in which I depicted that computer-based "personal assistants" would
begin answering questions by the 2005-2007 time frame. I was about to throw in
the sponge on this prediction until I saw the above AT&T demonstration. But
I think now that it might happen.
True understanding of our human world
All computer-human interactions would be greatly enhanced if
a computer could gain an understanding of our human world. This is a major,
major undertaking. To really understand our human world, the computer/robot will
have to feel pleasure and pain, to possess emotions, and to possess goals of its
own (discussed in greater detail below).
Optical character, and handwriting recognition
Optical character recognition has been around for at least
four decades with the U. S. Postal Service, and has been teamed with desktop
scanners for at least 15 years. Handwriting recognition was popularized with
Apple's Newton. I don't have any feeling for just how well this can be done when
cost is no object. How well can the Post Office read printed and hand-written
addresses?
A machine vision system must have a resolution of 30
arc-seconds to achieve 300-dot-per-inch resolution at 21 inches from the eye, or
400-dot-per-inch resolution at 16 inches from the eye. A robot will have to
perform its functions under varying lighting conditions.
Facial recognition
Facial recognition is a relatively recent addition to the
robotic panoply of tricks. It works well enough to
have considered using for surveillance in crowds.
IBM has explored allowing the computer to respond to gestures
as well as keyboard and mouse input, and Vanderbilt is experimenting with having
the computer read facial expressions.
Most of these capabilities appear to be relatively
insensitive to hardware improvements.
Although computer performance has
increased by a factor of ten every 5 years, the performances of, e. g.,
speech recognition systems has increased only modestly. Apparently, these
programs aren't hampered much by current hardware limitations. I would expect to
see slow improvement in them over the years.
These human-level capabilities probably function
significantly differently than the human brain in that the brain is basically
self-teaching.