Building the Largest Gender-Intentional AI Corpora

Karya, in collaboration with a major philanthropic donor is working to build the largest gender-intentional corpora in Indic languages yet, while providing livelihood opportunities for rural women from low-income communities.

24K+

hours of Voice Data

6

Languages

30K+

Unique Female Speakers

100

Districts across India

placeholder

Under this project, Karya will onboard and work with 30,000 participants to generate gender-bias-specific speech data in 6 languages spoken by over 900M people & covering over 75% of India.

The end goal is to build a Gender-Intentional Data Tool that can flag biased sentences in AI corpora across the 6 languages, while building the largest gender-intentional AI corpora in Indic language history.

Karya’s workforce can rebuild your training datasets to improve inclusion, with speed and at scale.