Mapping India’s Linguistic Diversity with Project Vaani

Project Vaani, a collaboration between Google and Indian Institute of Science, aims to map India's diverse linguistic landscape by collecting audio speech data from approximately 1 million people across 773 districts.

6000

hours of Voice Data

30

Districts

24K+

Unique Speakers

15

minutes/speaker

placeholder

With Karya's expertise, 6000 hours of voice data will be gathered from 30 districts, contributing to one of the largest datasets of Indian dialects, totaling over 150,000 hours of audio upon completion. Karya plays a pivotal role in this initiative by mobilising local communities, training field coordinators, and ensuring fair compensation for data collectors, thus empowering local voices and fostering inclusivity.