Boredom Central - Professional MQM-annotated machine translation dataset

Disclosure: this is our own dataset.

Our dataset consists of 362 translation segments annotated by 48 professional linguists (not crowdsourced) across 16 language pairs.

MT systems evaluated: EuroLLM-22B, Qwen3-235B, TranslateGemma-12B.

Language pairs (all from English): Arabic (MSA, Egyptian, Moroccan, Saudi), Belarusian, French, German, Hmong, Italian, Japanese, Korean, Polish, Portuguese (Brazilian and European), Russian, Ukrainian.

Each segment includes full MQM error annotations:

error category (accuracy, fluency, terminology, etc.)
severity level (minor, major, critical)
exact error span in the text
multiple annotators per segment for inter-annotator agreement analysis

Methodology follows WMT guidelines. Kendall’s τ = 0.317 on IAA – roughly 2.6x what typical WMT campaigns report.

It may be useful for MT evaluation research and benchmarking translation quality.

Dataset: https://huggingface.co/datasets/alconost/mqm-translation-gold

Happy to answer questions about the annotation process!

submitted by /u/ritis88
[link] [comments]

Professional MQM-annotated Machine Translation Dataset – 16 Lang Pairs, 48 Annotators

Leave a Reply Cancel reply

Recent Posts

Recent Comments

18+ Content

Leave a Reply Cancel reply

Recent Posts

Recent Comments