Media contact:
Beth Gaston
(703) 306-1070/[email protected]

NSF program contact:
Gary Strong
(703) 306-1928/[email protected]

CAN COMPUTERS COMMUNICATE LIKE PEOPLE DO?

Imagine two people at a table in a restaurant. Are
they intimately leaning toward each other or sitting
stiffly? Are they gazing dreamily at each other or avoiding
eye contact? How are they gesturing? What are they saying
and with what tone of voice? A mere glance and a snippet of
conversation make it easy for a person to quite accurately
guess the situation: is it lovers, friends having an
argument, or a business meeting?

Humans have an ability that exceeds computers to
process many different types of information -- images, words
and intonation, posture and gestures, and written language -
- and from these draw a conclusion. More "natural"
interactions with "smarter" computers will make them
accessible to a broader range of people (including people
with disabilities) in a wider range of settings while being
more useful in helping people sort through and synthesize
the glut of available information.

A set of 15 awards in a new $10 million program led by
the National Science Foundation -- Speech, Text, Image and
Multimedia Advanced Technology Effort (STIMULATE) -- will
fund university researchers investigating human
communication and seeking to improve our interaction with
computers. Four agencies, NSF, National Security Agency
Office of Research and SIGINT Technology, the Central
Intelligence Agency Office of Research and Development, and
the Defense Advanced Research Projects Agency Information
Technology Office are participating.

"This program goes well beyond the era of graphical
interfaces with our computers," said Gary Strong, NSF
program manager. "Perhaps some day we can interact with our
computers like we interact with each other, even having
`intelligent' computer assistants. STIMULATE has the
potential for enormous impact on anyone who must process
large amounts of data as well as for people with
disabilities, the illiterate and others who might not be
able to use a computer keyboard."

Funded projects include: a filter for TV, radio and
newspaper accounts that will quickly provide a user with a
synopsis; a computerized translation program; and a
"humanoid" computer that will understand human communication
including facial expressions, gestures and speech
intonation. Others projects include speech recognition,
understanding handwriting, and indexing and retrieving
video.

-NSF-

List of STIMULATE awardees

NSF is an independent federal agency responsible for
fundamental research in all fields of science and
engineering, with an annual budget of about $3 billion. NSF
funds reach all 50 states, through grants to more than 2,000
universities and institutions nationwide. NSF receives
more than 50,000 requests for funding annually, including at
least 30,000 new proposals. ** Receive NSF news releases
and tipsheets electronically via NSFnews. To subscribe,
send an e-mail message to [email protected]; in the body
of the message, type "subscribe nsfnews" and then type your
name. Also see NSF news products at:
http://www.nsf.gov:80/od/lpa/start.htm and http://www.ari.net/newswise

STIMULATE Awards

Contact:
Beth Gaston, NSF
(703) 305-1070

Midge Holmes, CIA
(703) 482-6686

Judith Emmel, NSA Public Affairs
(301) 688-6524

Ë™ Alfred Aho, Shih Fu Chang and Kathleen McKeown Columbia
University
(212) 939-7004, [email protected]
An Environment for Illustrated Briefing and Follow-up Search
Over Live Multimedia Information
Researchers seek to provide up-to-the-minute briefings on
topics of interest, linking the user into a collection of
related multimedia documents. On the basis of a user profile
or query, the system will sort multimedia information to
match the user's interests, retrieving video, images and
text. The system will automatically generate a briefing on
information extracted from the documents and determined to
be of interest to the user.

Ë™ James Allan and Allen Hanson
University of Massachusetts, Amherst
(413) 545-3240, [email protected]
Multi-Modal Indexing, Retrieval, and Browsing: Combining
ContentBased Image Retrieval with Text Retrieval.
In the rapidly emerging area of multimedia information
systems, effective indexing and retrieval techniques are
critically important. In this project, the Center for
Intelligent Information Retrieval will develop a system
to index and retrieve collections including combinations
of images, video and text.

Ë™ Jaime Carbonell
Carnegie Mellon University
(412) 268-3064, [email protected]
Generalized Example-based Machine Translation
With example-based machine translation, computers search
pre translated texts for the closest match to each new
sentence being translated. The goal of this project is to
develop generalizations that will increase the accuracy of
translations and reduce the size of the necessary data
base.

Ë™ Justine Cassell
MIT
(617) 253-4899, [email protected]
A Unified Framework for Multimodal Conversational Behaviors
in Interactive Humanoid Agents.
Humans communicate using speech with intonation and
modulation, gestures, gaze and facial expression.
Researchers will study how humans interact and develop a
humanoid computer that can produce human-like communicative
behaviors and comprehend complex communication on the part
of humans.

Ë™ Charles Fillmore
International Computer Science Institute, UC Berkeley
[email protected]
Tools for Lexicon Building
This project contains two parts: computational tools for
language research and a thesaurus-like database of English
words with definitions, how each word relates to other
similar words and the range of each word's use. The tools
and the database will be useful for researchers studying
language processing and speech recognition.

Ë™ James Flanagan, Casimir Kulikowski; Joseph Wilder
Grigore Burdea and Ivan Marsic
Rutgers University
(908) 445-3443, [email protected]
Synergistic Multimodal Communication in Collaborative
Multiuser Environments
Digital networking and distributed computing open
opportunities for collaborative work by geographically
separated participants. But participants must communicate
with one another, and with the machines they are using. The
sensory dimensions of sight, sound and touch, used in combination,
are natural modes for the human. This research establishes
computer interfaces that simultaneously use the modalities of
sight, sound and touch for human-machine communication. Emerging
technologies for image processing, automatic speech recognition,
and forcefeedback tactile gloves support these multimodal
interfaces.

Ë™ James Glass, Stephanie Seneff and Victor Zue
MIT
(617) 253-1640, [email protected]
A Hierarchical Framework for Speech Recognition and
Understanding
Most current speech recognizers use very simple
representations of words and sentences. In this project,
researchers aim to incorporate additional sources of
linguistic information such as the syllable, phrase and
intonation, into a system which can be used for
understanding conversational speech. They plan to develop a
model that can be applied to many languages.

Ë™ Barbara Grosz and Stuart Shieber
Harvard University
(617) 495-3673, [email protected] Human-Computer
Communication and Collaboration
This project will develop methods for designing and building
software that operates in collaboration with a human user,
rather than as a passive servant. The aim is to apply
theories of how people collaborate to the problem of the
design of software, keeping in mind the differing
capabilities of the human and computer collaborators.

Ë™ Jerry Hobbs and Andrew Kehler
SRI International
(415) 859-2229, [email protected]
Multimodal Access to Spatial Data
This project will focus on enabling computers to
understand what people are referring to as they use language and
gesture while interacting with computer systems that provide
access to geographical information. The results will enhance
the capabilities and ease of use of future interactive
systems, such as systems for travel planning and crisis
management.

Ë™ Fred Jelinek, Eric Brill, Sanjeev Khudanpur and David
Yarowsky
Johns Hopkins University
(410) 516-7730, [email protected]
Exploiting Nonlocal and Syntactic Word Relationships in
Language Models for Conversational Speech Recognition
Interacting with computers by speech or handwriting will
make computers more accessible to people with disabilities
and will allow users to carry on other tasks, like querying an
online maintenance manual while performing mechanical
repairs. To recognize speech or handwriting, most mechanical
systems look only at nearby words to identify unknowns,
while people doing the same tasks use the entire context.
This project will focus on improving the recognition
accuracy for spoken and handwritten language and will
provide techniques applicable to all types of language
modeling.

Ë™ Kathleen McKeown and Judith Klavans
Columbia University
(212) 939-7118, [email protected]
Generating Coherent Summaries of On-Line Documents:
Combining Statistical and Symbolic Techniques
This project will allow computers to analyze the text from a
set of related documents across many subject areas and
summarize the documents. Within the summary, similarities
and differences between documents will be highlighted,
indicating what each document is about. The research will be
part of a digital library project emphasizing aids for
reducing information overload.

Ë™ Mari Ostendorf
Boston University
(617) 353-5430, [email protected]
Modeling Structure in Speech above the Segment for
Spontaneous Speech Recognition
Current speech recognition technology leads to unacceptably
high error rates of 30-50 percent on natural conversational
or broadcast speech, in large part because current models
were developed on read speech and do not account for
variability in speaking style. This project aims to improve
recognition performance by representing structure in speech
at the level of the syllable, the phrase and with a
different speaker.

Ë™ Francis Quek and Rashid Ansari
University of Illinois at Chicago
(312) 996-5494, [email protected]
Gesture, Speech and Gaze in Discourse Management
This project involves experiments to discover and quantify
the cues to human communication, including the role of
gestures, speech intonation and gaze, and then develop
computer programs capable of recognizing such cues in
videos.

Ë™ Elizabeth Shriberg and Andreas Stolcke
SRI International
(415) 859-3798, [email protected]
Modeling and Automatic Labeling of Hidden Word-Level Events
in Speech
Most computer systems that process natural language require
input that resembles written text, such as one would read
in a newspaper. Spoken discourse, however, differs from
text in ways that present challenges to computers. One
challenge is that speech does not contain explicit
punctuation such as periods to separate sentences. Another
challenge is that when people speak naturally, they say
things like "um" or "uh," Syou-know" and other word-level
events which interrupt the formal structure of
sentences. This project will use word patterns as well as
the timing and melody of speech to identify sentence
boundaries and nongrammatical events to help computers
better understand natural speech.

Ë™ Yao Wang and Edward Wong
Brooklyn Polytechnic University
(718) 260-3469, [email protected]
Video Scene Segmentation and Classification Using Motion
and Audio Information
A video sequence includes lots of different types of
information, including speech, text, audio, color patterns
and shapes in individual frames, movement of objects as
shown by changes between frames. Humans can quickly
interpret information; computer understanding of video is
still primitive. The aim of this project is to develop new
theory and techniques for scene segmentation and
classification in a video sequence, which will have direct
applications in information indexing and retrieval in
multimedia databases, spotting and tracking of special
events in surveillance video, and video editing.

MEDIA CONTACT
Register for reporter access to contact details