Mozilla Blog: Sharing our Common Voices – Mozilla releases the largest to-date public domain transcribed voice dataset

Mozilla Blog: Sharing our Common Voices – Mozilla releases the largest to-date public domain transcribed voice dataset. “From the onset, our vision for Common Voice has been to build the world’s most diverse voice dataset, optimized for building voice technologies. We also made a promise of openness: we would make the high quality, transcribed voice data that was collected publicly available to startups, researchers, and anyone interested in voice-enabled technologies. Today, we’re excited to share our first multi-language dataset with 18 languages represented, including English, French, German and Mandarin Chinese (Traditional), but also for example Welsh and Kabyle. Altogether, the new dataset includes approximately 1,400 hours of voice clips from more than 42,000 people.”