What is it about?

We introduce a new instruction benchmark, MERA, oriented towards the foundation models' performance on the Russian language. The benchmark encompasses 21 evaluation tasks for generative models covering 10 skills and is supplied with private answer scoring to prevent data leakage. The paper introduces a methodology to evaluate foundation models and language models in fixed zero- and few-shot instruction settings that can be extended to other modalities. We propose an evaluation methodology, an open-source code base for the MERA assessment, and a leaderboard with a submission system. We evaluate open language models as baselines and find they are still far behind the human level. We publicly release MERA to guide forthcoming research, anticipate groundbreaking model features, standardize the evaluation procedure, and address potential ethical concerns and drawbacks.

Featured Image

Why is it important?

Over the past few years, one of the most notable advancements in AI research has been in foundation models, headlined by the rise of language models. However, despite researchers’ attention and the rapid growth in language model application, the capabilities, limitations, and associated risks still need to be better understood.

Read the Original

This page is a summary of: MERA: A Comprehensive LLM Evaluation in Russian, January 2024, Association for Computational Linguistics (ACL),
DOI: 10.18653/v1/2024.acl-long.534.
You can read the full text:

Read

Contributors

The following have contributed to this page