Mozilla today released the latest version of Common Voice, its open source collection of transcribed voice data for startups, researchers, and hobbyists to build voice-enabled apps, services, and devices. Common Voice now contains over 7,226 total hours of contributed voice data in 54 different languages, up from 1,400 hours across 18 languages in February 2019. From a report:
Common Voice consists not only of voice snippets, but of voluntarily contributed metadata useful for training speech engines, like speakers’ ages, sex, and accents. It’s designed to be integrated with DeepSpeech, a suite of open source speech-to-text, text-to-speech engines, and trained models maintained by Mozilla’s Machine Learning Group. Collecting the over 5.5 million clips in Common Voice required a lot of legwork, namely because the prompts on the Common Voice website had to be translated into each language. Still, 5,591 of the 7,226 hours have been confirmed valid by the project’s contributors so far. And according to Mozilla, five languages in Common Voice – English, German, French, Italian, and Spanish – now have over 5,000 unique speakers, while seven languages – English, German, French, Kabyle, Catalan, Spanish, and Kinyarwandan – have over 500 recorded hours.