Meta’s Fundamental AI Research (FAIR) team has unveiled Omnilingual ASR, a cutting-edge automatic speech recognition system capable of comprehending and transcribing over 1,600 spoken languages. This innovative system includes support for approximately 500 low-resource languages, some of which have never before had access to AI-driven transcription services. Meta aims to enhance access to digital speech tools for diverse communities and regions through this development.
Alexandr Wang, Meta’s AI chief, shared on X platform, “Meta Omnilingual ASR expands speech recognition capabilities to 1,600+ languages, encompassing 500 previously unsupported languages, marking a significant stride towards achieving truly universal AI. We are making available a comprehensive suite of models and a dataset.”
Omnilingual ASR, Meta’s latest open-source speech recognition solution, is founded on automatic speech recognition (ASR) technology and is engineered to cater to a broader range of languages compared to existing tools. Traditional speech recognition platforms predominantly focus on widely spoken languages like English, neglecting many smaller linguistic groups that lack reliable transcription support.
With Omnilingual ASR, Meta endeavors to bridge this gap by accommodating over 1,600 languages, including those with limited digital documentation or training data, commonly known as “low-resource languages.” These languages have historically lacked sufficient recorded material or research support to benefit from speech AI technologies.
“Meta’s Fundamental AI Research (FAIR) team introduces Omnilingual ASR — a revolutionary suite of models enabling automatic speech recognition for 1,600 languages, encompassing 500 low-resource languages previously untranscribed by AI,” as stated in Meta’s official blog post.
Central to the system is Omnilingual wav2vec 2.0, a multilingual speech model with a scale of seven billion parameters. This model, one of Meta’s largest speech recognition models to date, is trained to accommodate diverse accents, dialects, and speech nuances.
Meta combined public datasets with recordings sourced from global communities to develop Omnilingual ASR. Collaborations with initiatives such as Mozilla Foundation’s Common Voice, Lanfrica, and NaijaVoices facilitated the collection of speech samples from languages with limited digital presence. The involvement of local speakers in real-world audio recordings aimed to create a more representative dataset than previous efforts reliant on controlled lab data.
Despite Meta considering the Omnilingual ASR model a significant achievement, it acknowledges varying accuracy levels based on the language. Internal data reveals that over 95% of high and medium resource languages achieved a character error rate below 10%. In contrast, only 36% of low-resource languages met this benchmark, highlighting the persistent challenges in developing AI for under-documented languages.
By open-sourcing Omnilingual ASR, Meta looks to support researchers, developers, and organizations working on speech technology, accessibility, translation, and communication tools.
Meta’s AI labs researchers are actively working on advancing smarter and more potent AI models, potentially reaching superintelligence levels where AI systems possess human-like cognitive capabilities. Superintelligence involves AI comprehending culture, context, and diverse human expressions. With Omnilingual ASR, Meta appears to be mastering various languages, laying the groundwork for the ambitious Superintelligence Project.
