Cynthia L. Atwood, Science Correspondent
Yale University Office of Public Affairs
433 Temple St.
New Haven, CT 06520
(203) 432-1326, fax (203) 432-1323
[email protected]

CONTACT: Cynthia L. Atwood
For Immediate Release: Dec. 17, 1996

Visual Tracking System in Yale University Robotics Laboratory
Enables Three-dimensional Mouse to Operate Robotic Arm

New Haven, CT -- With the advent of powerful desktop computers,
progress in robotics research is occurring increasingly in the realm
of software development rather than hardware development. This trend
also has triggered a greater exchange of ideas between researchers in
robotic vision and computer animation, says Yale University computer
scientist Gregory D. Hager.

The result is an array of new ideas being explored by the Yale
Center for Computational Vision and Control ranging from a
three-dimensional computer mouse that can control the motion of a
robotic arm to a visual tracking system that can superimpose a clown
face over a human face on a television monitor. The clown face
demonstrates how well the tracking system can latch onto and follow
eye and mouth motions, which could have applications in automated
video surveillance, teleconferencing or computer animation.

The 3-D mouse -- called a Surfball by its inventor, Yale graduate
student Kentaro Toyama -- is a hand-held racquetball that can guide a
robotic arm as it performs a precise 3-D task like inserting a disk
into a computer disk drive. A color video camera tracks the movement
of dots painted on the ball, which is suspended by elastic bands in a
stationary frame. As a 3-D remote control device, the Surfball is
well-suited for controlling robots in outer space or in contaminated
environments where humans cannot safely go.

Unlike a conventional computer mouse, which moves on a mouse pad
in only two directions, the Surfball can be maneuvered in three
directions to give users greater control over corresponding motions
by the robotic arm. The promising invention, for which Yale and Mr.
Toyama have filed a provisional patent application, could provide
greater motion control than a joystick for playing 3-D video games.

In another demonstration, researchers don a red baseball cap and
play a game of virtual ping pong with a ball on a television monitor,
bouncing the ball off the hat. Yet another student (there are four
graduate students, three undergraduates and a post-doctoral researcher
in Professor Hager's laboratory) has developed a computer drawing
program that responds to free hand motions. Meanwhile, a mobile robot
shaped like a barrel and equipped with television cameras tests the
ability of Yale's software programs to navigate in an unfamiliar
environment.

"Many of the demonstrations use two cameras instead of one for
tracking an object, thereby giving the tracking system binocular
vision like humans have," says Professor Hager, who in 1993 developed
a software program he calls XVision with the help of a graduate
student. Written in a flexible computer language called C++, XVision
forms the basis for all the tasks performed in the robotics laboratory
that require visual tracking. Another C++ module called Servomatic
interfaces with the XVision framework to provide robotic hand-eye
coordination.

Human Vision Provides a Metaphor
While Professor Hager is interested in how humans perform visual
tasks, he views human behaviors as "weak metaphors" for what machines
can accomplish. "The reason the field of robotics reached a plateau
after a growth spurt in the early 1980's is because expectations were
very high that researchers could develop a complex, generalized vision
system applicable to a wide array of tasks, much like the human vision
system. That turned out to be a much harder nut to crack than anybody
thought it would be," Professor Hager says.

What developed instead was an automated assembly line approach --
robotic hardware that simply repeats motions, such as assembling an
automobile, but cannot sense or respond to changes in the surrounding
environment, he says. Because of their inability to adapt to change,
such systems are very expense to create and narrowly limited to a
defined task.

In Yale's robotics laboratory, breakthroughs began occurring about
five years ago when Professor Hager moved away from efforts to create
a generalized vision system and began developing task-specific
software solutions -- a paradigm shift made possible by the increasing
speed of desktop computers. "I can create a demonstration software
program in 45 minutes that solves a particular problem, such as
inserting a disk in a disk drive, and then move on to a completely new
software solution for the next problem. That may not be the way
humans perform vision tasks, but it seems to be a practical strategy
for computers and machines," he says.

Creating Software Modules to Perform Basic Tasks
By breaking down problems into parts and creating software modules
to handle each part, the Yale robotics researchers are making strides
that could lead to a fresh growth spurt in applications. He and his
colleagues are linking together primitive commands -- software
instructions for a handful of basic tasks -- to guide the motion of a
robotic arm turning a screwdriver, for example, or a mobile robot
navigating around an obstacle. They are able to use inexpensive,
off-the-shelf hardware.

"XVision is very much like computer animation," he explains.
"Both represent complex objects in terms of a small set of simple
components. In animation, the software program draws those components
to create an image, while Xvision analyzes the camera's image to
detect and track underlying simple components. Both approaches rely
on linking primitive commands together."

The key in robotics, he believes, is to build control software
using an image-based approach, as opposed to a position-based approach.
Instead of calibrating positions and figuring out how far to move two
objects to bring them together, for example, Professor Hager instructs
the robot to move the objects until the images in each of the two
cameras converge -- a strategy that yields a high degree of accuracy.

Next, Professor Hager's group will tackle the development of
fail-safe programs that will allow researchers to try riskier
strategies that are more error-prone. If XVision loses track of head
motions, for example, a fail-safe program will enable it to quickly
latch on again by searching for features such as eyes and mouth. The
goal is to get a robotic vision system that is extremely reliable.

###

Note to Editors: Gregory D. Hager is an associate professor of
computer science who joined the Yale faculty in 1991. He received his
master's and Ph.D. degrees from the University of Pennsylvania. His
research focuses on understanding the connections between perception
and reasoning or action, and he is interested in building systems that
combine sensing with planning. For interviews, contact him at (203)
432-6432 or view his web page at
http://www.cs.yale.ed/HTML/YALE/HyPlans/hager.html

MEDIA CONTACT
Register for reporter access to contact details