Mapping India’s Linguistic Diversity with Project Vaani

Project Vaani, a collaboration between Google and Indian Institute of Science, aims to map India's diverse linguistic landscape by collecting audio speech data from approximately 1 million people across 773 districts.

6000

hours of Voice Data

Districts

24K+

Unique Speakers

minutes/speaker

With Karya's expertise, 6000 hours of voice data will be gathered from 30 districts, contributing to one of the largest datasets of Indian dialects, totaling over 150,000 hours of audio upon completion. Karya plays a pivotal role in this initiative by mobilising local communities, training field coordinators, and ensuring fair compensation for data collectors, thus empowering local voices and fostering inclusivity.

See All Case Studies

Connect with a Data Expert

Data Services

Technology

Ethical Data

Team

Advisors

Partnerships

Careers

Team

Advisors

Partnerships

Careers

Data Services

Technology

Ethical Data

Related

Data Services

Technology

Ethical Data

Team

Advisors

Partnerships

Careers

Team

Advisors

Partnerships

Careers

Data Services

Technology

Ethical Data

Related

Building the largest annotated text dataset in Odia for the healthcare, banking and agriculture domains

Building the Largest Gender-Intentional AI Corpora