90K
Human evaluations
30
Models
10
Indian Languages
3
weeks
Evaluation of multilingual LLMs is challenging due to insufficient linguistic diversity, benchmark contamination and the lack of local, cultural nuances in translated benchmarks. Karya’s data experts can evaluate models based on an array of benchmarks, including testing for linguistic acceptability, hallucinations, reasoning, and creativity. Karya’s data experts can evaluate models based on an array of benchmarks, including testing for linguistic acceptability, hallucinations, reasoning, and creativity.