USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis

Luca Jiang-Tao Yu; Running Zhao; Sijie Ji; Edith C.H. Ngai; Chenshu Wu

doi:10.1145/3729462

What is it about?

Have you ever struggled with clear phone calls or voice assistants in noisy environments? USPEECH, our new technology, helps devices hear voices more clearly using inaudible ultrasound signals. Training such a system is challenging and labor-intensive. Our innovation is an AI that learns the link between lip movements and speech, generating high-quality synthetic ultrasound data from audio alone. This data trains a powerful speech enhancement network, making the process efficient and scalable.

Photo by Adrien on Unsplash

Why is it important?

Current methods for improving speech clarity with ultrasound are held back by a major problem: a lack of large, clean training datasets. Collecting this data is a slow, expensive, and difficult manual process. Our work, USPEECH, provides the first solution to this data scarcity problem. By creating a way to synthesize high-quality ultrasound data automatically, we remove the biggest barrier to progress in this field. This minimizes the need for human effort and makes it possible to build much larger and more diverse datasets than ever before. The impact is significant: it paves the way for more robust and reliable speech enhancement on everyday devices like smartphones. This means clearer voice calls in crowded places, more accurate voice commands for your smart devices, and ultimately, a more seamless interaction between humans and technology. Our synthesis framework could also accelerate research in other areas, such as silent speech interfaces and gesture recognition.

Perspectives

As a researcher, I was motivated by the immense challenge and inefficiency of collecting clean data for audio applications. We realized that if we could successfully synthesize realistic ultrasound data, we could break through a major bottleneck that was holding back not just speech enhancement, but a whole range of potential ultrasound-based interactions. It was exciting to see our 'audio as a bridge' concept come to life and produce results comparable to real-world data. My hope is that USPEECH will not only improve speech clarity on our devices but also inspire others to tackle data scarcity problems in new and creative ways
Mr. Luca Jiang-Tao Yu
The University of Hong Kong

This page is a summary of: USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis, Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies, June 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3729462.
You can read the full text:

Read

Resources

Project
The Project Website of USpeech
A website for the USPEECH project, showcasing real-world examples, demonstrations of the speech enhancement framework, and official GitHub open-source code.

Contributors

The following have contributed to this page

Mr. Luca Jiang-Tao Yu
The University of Hong Kong

Low labour-intensive ultrasound-based speech enhancement framework

What is it about?

Why is it important?

Perspectives

Resources

The Project Website of USpeech

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Low labour-intensive ultrasound-based speech enhancement framework

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

The Project Website of USpeech

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management