Subscribe Now to Our Free Email Newsletter

October 21st, 2007 at 8:19 pm

Joystick Takes Cues by Voice

"Ah," "ee," "aw" and "oo" are not just vowel sounds, but vocalizations that can now control a computer cursor. The Vocal Joystick detects sounds at 100 times a second and then turns them into movements on the screen.


Voice Computing
Remi Corrello

The device could help people with disabilities surf the Internet, draw, play video games, control an electronic wheelchair and even operate a robotic arm.

"If you have full vocal capabilities, it’s very natural and doesn’t require anything special," said Jeff Bilmes, an associate professor of electrical engineering at the University of Washington in Seattle.

The system is essentially a software program that allows users to specify any sounds they want and then associate them with movements otherwise controlled by joystick knobs or buttons.

For example, "ah" could drive the cursor up; "oo" send it down; "ee" make it go left and "aw" make it go right.

A hard "k" sound could do the same thing as clicking a mouse button and a "ch" could simulate releasing the button. Increase the volume and the cursor moves faster.

The technology calls to mind Victor Borge, the musician and comedian famous for his "Phonetic Punctuation" routine, in which he recited a story, making onomatopoetic sounds for the commas, periods, question marks, etc.

And it’s no surprise.

"I’ve been a fan of his since I was a kid. He was a definitely a big inspiration for coming up with this idea," said Bilmes.

But the Vocal Joystick is not just about short vowel or consonant sounds to move a cursor. When paired with speech recognition, the technology could give an impaired person full command of a computer.

A person could, for example, ah and oo their way to an Internet site and "k-ch" into the search box. Once there, they could speak the search criteria. They could also draw circles or squares in a drawing program.

Using the vowel and consonant sounds is more useful than saying "move up" or "move down," which is not very practical, said Alex Acero, research area manager at Microsoft Research in Redmond, Wash., an expert in speech recognition and natural language processing.

"I don’t remember anyone else doing something like that before. And it’s fairly accurate and fairly good," he said.

And when compared to other hands-free systems, the Vocal Joystick could have some advantages. For example, eye-tracking devices are not only costly, but require that the user control the cursor and read information at the same time.

And some devices work inside the user’s mouth and are controlled by the tongue. But that prevents the person from talking or interacting with other people in the room.

"The biggest challenge," said Acero, "is that this is a new interface modality for the user. Will they be willing to use it? We need to see a product and run it through users."

But having options is a good thing. "I think it’s great to have more than one [interface] and play around with it and see what the users like," he said.

Bilmes and his team have been testing the device since March with spinal-cord injury patients at the University of Washington Medical Center with good results.

This week the team presented the latest development of the Vocal Joystick at the Ninth International ACM SIGACCESS Conference on Computers and Accessibility in Tempe, Ariz. There, he will demonstrate the device controlling a robotic arm.

Via: Discovery Channel


You must be logged in to post a comment.

DaVinci Coders