Low-Resource ASR

The Problem

Modern speech recognition works remarkably well—if you speak English.

Deep learning models like wav2vec 2.0 and Whisper have approached human-level accuracy for converting speech to text in data-rich languages. But these models struggle with languages they weren't trained on, even when fine-tuned by conventional methods. With the vast majority of the world's 7,000+ languages considered "data-scarce," this limits the reach of speech technology to a small fraction of the world's linguistic diversity.

Our Approach

This project is preparing to conduct large-scale experiments to identify methods for rapidly adapting state-of-the-art ASR models to extremely low-resource languages. We draw on recordings from the Endangered Languages Archive, testing across a diverse selection of under-documented languages.

One avenue we're exploring is linguistically-informed data pooling: grouping languages that share phonetic features during training, so that they can help each other overcome the data bottleneck that any single low-resource language faces alone.

Why It Matters

Speech recognition is foundational for many technologies we take for granted: voice assistants, dictation, captioning, and voice search. For speakers of endangered languages, these tools could help with documentation, education, and keeping languages vital in the digital age. Our goal is to find training methods that make ASR adaptation fast and practical, even with very limited data.

Support

This project is supported by a compute allocation from Empire AI, a New York State initiative providing AI research infrastructure.

← Back to Projects