Conversational Data and Call Center Data Collection(2400 hours)

Speech data

September 2022

Conversational Data and Call Center Data Collection(2400 hours)

placeholder

The Challenge

Language plays a vital role in communication among people as well as in accessing information and building an inclusive society. India is home to 22 constitutionally recognized languages. An important barrier faced by many Indians is the interface-barrier of reading, writing and typing when dealing with automated systems as we progress towards Digital-India. Speech technologies provide a natural interface to overcome this barrier, achieving the goal of ‘One India’ in the digital landscape. Thus, Microsoft reached out to us to use our large network of workers in rural India to collect conversational data across multiple domains in all major Indian languages.

The Solution

In order to collect high-quality conversational data in the right environment and conditions, the Karya team remotely employed over 2000 villagers in multiple states across India. Dual-channel conversations were collected, keeping the client criteria in mind. Post collection and validation of the speech data, another set of workers transcribed the collected data to the clients specifications.