Common Voice
Equitable speech data for AI.
About Common Voice
Common Voice is a crowdsourcing initiative started by the Mozilla Foundation to create a massive, free database of human speech that can be used to train speech recognition software. Most voice datasets used by major tech companies are proprietary and expensive, which creates a barrier for smaller developers and researchers. Mozilla’s project aims to democratize this space by making its database available under the public domain CC0 license. Volunteers contribute to the project by recording themselves reading sample sentences or by validating the recordings of others to ensure accuracy and quality. Launched in 2017, the platform was specifically designed to be inclusive, intentionally collecting voice samples from people with different accents, genders, and ages to reduce bias in AI models. As of 2025, Common Voice has collected thousands of hours of data across over 100 languages, including many underrepresented and "long-tail" languages that are often ignored by commercial products. By providing an open alternative to private datasets, Common Voice fosters innovation and healthy competition in the development of voice-to-text and text-to-voice applications globally.
Tags
Added June 19, 2017
commonvoice.mozilla.orgReport an Issue
Reporting: Common Voice
Thank You!
Your report has been submitted. We'll review it shortly.