
Meta just released a new one Multilingual automatic speech recognition (ASR) system Supporting 1,600+ languages - Dwarfing Openai’s open-source Whisper model, which only supports 99.
What architecture also allows developers to support thousands of Through a feature called zero-shot in-context learning, users can present some paired examples of audio and text in a new language, helping the model mimic additional words in that language without any training.
In practice, this expands the potential coverage to over 5,400 languages.
This is a shift from static model capabilities to flexible frameworks that communities can adapt themselves. So while the 1,600 languages reflect official training coverage, the broader data represents the ability of heterogeneous ASR to generalize on demand, making it the most scalable speech recognition system released to date.
Best of all: it’s open under A simple Apache 2.0 license -No restrictions, like the company’s previous releases, the Koasi Open Source Llama License, with limited use by large enterprises unless they pay a licensing fee – means researchers and developers are free to implement and implement it freely, free of charge, without restrictions, even in commercial and enterprise-grade projects!
Released on November 10 Meta’s websitefor , for , for , . GitHubwith one Place the demo on the huggable face And Technical paperMETA’s scalable ASR suite includes a family of speech recognition models, a 7 billion-parameter multilingual audio representation model, and a large-scale speech corpus covering more than 350 languages.
All resources are freely available under open licenses, and the models support speech-to-text transcription out of the box.
“By making these models and datasets open, we want to break down language barriers, expand digital access, and empower communities around the world,” Meta posted. aiatmeta account at x
Designed for speech-to-text transcription
At its core, heterogeneous ASR is a speech-to-text system.
Models are trained to convert spoken language into written text, with the help of voice assistants, transcription tools, subtitles, oral archive digitization, and accessibility features for low-resource languages.
Unlike earlier ASR models that require extensive labeled training data, crossover ASR includes a zero-shot variant.
This version can simulate languages that have never been seen before – using just a few pairs of examples of audio and associated text.
This reduces the barrier to dramatically adding new or endangered languages, removing the need for large corpora or retraining.
Model family and technical design
The versatile ASR suite includes multiple models trained on over 4.3 million hours of audio from 1,600+ languages:
WAV2VEC 2.0 Model for Self-Supervised Speech Representation Learning (300M-7B Parameters)
A CTC-based ASR model for effective monitoring simulation
LLM-ASR models state-of-the-art transcription by combining a speech encoder with a transformer-based text decoder.
The LLM Zeroshot ASR model enables inference-time adaptation in unseen languages
All models follow an encoder-decoder design: raw audio is converted into a language-agnostic representation, then decoded into written text.
Why does scale matter?
Although Whisper and similar models have extolled the high potential of ASR for global languages, they fall short on the long tail of human linguistic diversity. Whisper supports 99 languages. Meta system:
Supports 1,600+ languages directly
Can generalize 5,400+ languages using contextual learning
Achieves a character error rate (CER) of less than 10% in 78% of supported languages
According to Meta’s research paper, their support includes more than 500 languages that have never been covered by any ASR model before.
This expansion opens up new possibilities for communities whose languages are often excluded from digital tools
Here’s a revised and expanded background section, complete with text and links, integrating the broader context of Meta’s 2025 AI strategy, leadership changes, and Lama 4’s reception.
Background: The meta’s AI overhaul and rebound from Lama 4
After a year of organizational turmoil, leadership changes, and uneven product implementation, the release of a monolingual ASR comes at a pivotal moment in Meta’s AI strategy.
Omnilingual ASR is the first major open source model release since the rollout of Llama 4, Meta’s latest major language model. Debuted in April 2025 The Chinese open source model drew mixed and ultimately poor reviews, with little enterprise adoption compared to competitors.
This failure led Meta founder and CEO Mark Zuckerberg to appoint Alexander Wang, co-founder and former CEO of AI data supplier Scale AI. As Chief AI Officerand travel on one Extensive and expensive hiring This surprised the AI and business communities Eye-watering salary packages for top AI researchers.
Conversely, a contrarian ASR represents a strategic and reputational reset. It returns Meta to a domain where the company has historically led linguistic AI—and offers a truly scalable, community-based stack with minimal barriers to entry.
The system’s support for 1,600+ languages and its expansion to over 5,000 zero-shot in-context learning re-establishes Meta’s engineering reputation in language technology.
Importantly, it does so through a free and legitimately licensed release, under Apache 2.0, with transparent dataset sourcing and reproducible training protocols.
This shift ties in with broader themes in Meta’s 2025 strategy. While the company has invested heavily in infrastructure (including custom AI accelerators and arm-based inference stacks released in September), the company has recast its narrative around a “personal superintelligence” vision. Source Ditching the metaverse in favor of foundation AI abilities. The return to public training data in Europe after a regulatory pause also signals its intention to compete globally despite privacy scrutiny. Source.
General Linguistic ASR, then, is more than a model release—it’s a calculation to control the narrative: from the fragmented rollout of Llama 4 to a high-efficiency, research-based contribution that aligns with Meta’s long-term AI platform strategy.
Community-based dataset collection
To achieve this scale, META partnered with researchers and community organizations in Africa, Asia and elsewhere to create a heterogeneous ASR corpus, a dataset of 3,350 hours in 348 low-resource languages. Contributors were paid local speakers, and recordings were collected in collaboration with groups such as:
African Next Voices: A Gates Foundation – Consortium including Masano University (Kenya), University of Pretoria, and Data Science Nigeria
The shared voice of the Mozilla Foundationsupported by the Open Multilingual Speech Fund
Lan Africa / Nigeriawhich produced data for 11 African languages including Igala, Serer, and Arohobo.
Data collection focused on natural, unscripted speech. Prompts were designed to be culturally relevant and open-ended, such as “Is it better to have a few close friends or many casual acquaintances? Why?” Transcriptions established by the writing system are used, with quality assurance at every step.
Performance and hardware considerations
The suite’s largest model, the omniuser_lim_7b, requires GPU memory for inference, making it suitable for deployment on high-end hardware. Smaller models (300m-1b) can run on low-power devices and provide real-time transcription speeds.
Performance benchmarks show robust results even in low-resource scenarios.
CER < 10% in 95% of high-resource and medium-resource languages
36% of low-resource languages have CER <10%.
Robustness in noisy conditions and invisible domains, especially with fine tuning
The zero-shot system, OMNIASR_LLM_7B_ZS, can replicate new languages with minimal setup. Users provide some sample audio-text pairs, and the model replicates for new utterances in the same language.
Open access and developer tooling
All models and datasets are licensed under the following terms:
Apache 2.0 For models and code
CC-by 4.0 for ASR corpus on punishment for hugging
Installation via PYPI and UV is supported:
pip install omnilingual-asr
Meta also provides:
An embracing dataset integration
Pre-built inference pipelines
Language code conditioning for better accuracy
Developers can view the full list of supported languages using the API:
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
print(len(supported_langs))
print(supported_langs)
Broader implications
Controversial ASR reproduces the language coverage in ASR from a fixed list to one Extensible framework. It enables:
Community-driven inclusion in community-driven languages
Digital access for oral and endangered languages
Research on Speech Tactics in Linguistically Diverse Contexts
Importantly, Meta has emphasized ethical considerations—designing for open source participation and collaboration with native-speaking communities.
“No model can ever predict and include all the world’s languages,” the Convergent ASR paper states, but Convergent ASR makes it possible for communities to increase recognition from their data. “
Access the tools
All resources are now available:
Code + Model: github.com/facebookresearch/omnililagual-asr
The dataset: Related to hugs
Blog post: ai.meta.com/blog/omnililagual-asr
What does this mean for businesses?
For enterprise developers, especially those working in multilingual or international markets, heterogeneous ASR significantly lowers the barrier to deploying speech-to-text systems across a wide range of users and geographies.
Instead of relying on commercial ASR APIs that only support a narrow set of high-resource languages, teams can now integrate an open-source pipeline that covers more than 1,600 languages out of the box.
This flexibility is especially valuable for businesses operating in areas such as voice-based customer support, transcription services, outreach, education, or urban technology, where local language coverage may be a competitive or regulatory requirement. Because the models are released under the Permit Apache 2.0 license, businesses can fix, deploy, or integrate them into proprietary systems without restrictive conditions.
It also represents a shift in the ASR landscape—from centralized, cloud-gated offerings to community-based infrastructure. By making multilingual speech recognition more accessible, customizable and cost-effective, heterogeneous ASR opens the door to a new generation of enterprise speech applications built around linguistic inclusion rather than linguistic boundaries.