24K+
hours of Voice Data
6
Languages
30K+
Unique Female Speakers
100
Districts across India
Under this project, Karya will onboard and work with 30,000 participants to generate gender-bias-specific speech data in 6 languages spoken by over 900M people & covering over 75% of India.
The end goal is to build a Gender-Intentional Data Tool that can flag biased sentences in AI corpora across the 6 languages, while building the largest gender-intentional AI corpora in Indic language history.
Karya’s workforce can rebuild your training datasets to improve inclusion, with speed and at scale.