Building the Largest Gender-Intentional AI Corpora

Karya, in collaboration with a major philanthropic donor is working to build the largest gender-intentional corpora in Indic languages yet, while providing livelihood opportunities for rural women from low-income communities.

24K+

hours of Voice Data

Languages

30K+

Unique Female Speakers

100

Districts across India

Under this project, Karya will onboard and work with 30,000 participants to generate gender-bias-specific speech data in 6 languages spoken by over 900M people & covering over 75% of India.

The end goal is to build a Gender-Intentional Data Tool that can flag biased sentences in AI corpora across the 6 languages, while building the largest gender-intentional AI corpora in Indic language history.

Karya’s workforce can rebuild your training datasets to improve inclusion, with speed and at scale.

See All Case Studies

Connect with a Data Expert

Data Services

Technology

Ethical Data

Team

Advisors

Partnerships

Careers

Team

Advisors

Partnerships

Careers

Data Services

Technology

Ethical Data

Related

Data Services

Technology

Ethical Data

Team

Advisors

Partnerships

Careers

Team

Advisors

Partnerships

Careers

Data Services

Technology

Ethical Data

Related

Building the largest annotated text dataset in Odia for the healthcare, banking and agriculture domains

AI-Powered Grading for Early Childhood Education