What is it about?
Have you ever struggled with clear phone calls or voice assistants in noisy environments? USPEECH, our new technology, helps devices hear voices more clearly using inaudible ultrasound signals. Training such a system is challenging and labor-intensive. Our innovation is an AI that learns the link between lip movements and speech, generating high-quality synthetic ultrasound data from audio alone. This data trains a powerful speech enhancement network, making the process efficient and scalable.
Featured Image
Photo by Adrien on Unsplash
Why is it important?
Current methods for improving speech clarity with ultrasound are held back by a major problem: a lack of large, clean training datasets. Collecting this data is a slow, expensive, and difficult manual process. Our work, USPEECH, provides the first solution to this data scarcity problem. By creating a way to synthesize high-quality ultrasound data automatically, we remove the biggest barrier to progress in this field. This minimizes the need for human effort and makes it possible to build much larger and more diverse datasets than ever before. The impact is significant: it paves the way for more robust and reliable speech enhancement on everyday devices like smartphones. This means clearer voice calls in crowded places, more accurate voice commands for your smart devices, and ultimately, a more seamless interaction between humans and technology. Our synthesis framework could also accelerate research in other areas, such as silent speech interfaces and gesture recognition.
Perspectives
As a researcher, I was motivated by the immense challenge and inefficiency of collecting clean data for audio applications. We realized that if we could successfully synthesize realistic ultrasound data, we could break through a major bottleneck that was holding back not just speech enhancement, but a whole range of potential ultrasound-based interactions. It was exciting to see our 'audio as a bridge' concept come to life and produce results comparable to real-world data. My hope is that USPEECH will not only improve speech clarity on our devices but also inspire others to tackle data scarcity problems in new and creative ways
Mr. Luca Jiang-Tao Yu
The University of Hong Kong
Read the Original
This page is a summary of: USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis, Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies, June 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3729462.
You can read the full text:
Resources
Contributors
The following have contributed to this page







