Building the largest annotated text dataset in Odia for the healthcare, banking and agriculture domains

Text data

October 2020

Building the largest annotated text dataset in Odia for the healthcare, banking and agriculture domains

placeholder

The Challenge

For our clients, Ujjivan Bank and Navana Tech, we collected over 10,000 questions in the healthcare, banking and agriculture domain to build a text-based Q&A system.

The Solution

Karya employed over 5,000 workers across 4 districts in Odisha crowdsource a list of 10,000+ questions people had regarding healthcare, banking and agriculture. We then asked our workers to annotate the questions and mark them for intent. This allowed us to create a question-bank and divide the 10,000+ questions into 50 different “intents”. For each intent, we worked with experts to provide the right answers. This allowed our client to build a chat-based Q&A systems where villagers across Odisha could ask questions about their bank, their agricultural practices and their health.