Conversational Data and Call Center Data Collection

Speech data

June 2023

Conversational Data and Call Center Data Collection


The Challenge

Language plays a vital role in communication among people as well as in accessing information and building an inclusive society. India is home to 22 constitutionally recognized languages. An important barrier faced by many Indians is the interface-barrier of reading, writing and typing when dealing with automated systems as we progress towards Digital-India. Speech technologies provide a natural interface to overcome this barrier, achieving the goal of ‘One India’ in the digital landscape. Thus, IIT Madras reached out to us to use our large network of workers in rural India to collect conversational data across multiple domains in all major Indian languages.

The Solution

In order to collect over 500 hours of high-quality conversational data in the right environment and conditions, the Karya team remotely employed over 500 villagers in multiple states across India. Dual-channel conversations were collected, keeping the client criteria in mind. Post collection and validation of the speech data, another set of workers transcribed the collected data to the clients specifications.