Dataset of MMLU results broken down by task

I am primarily looking for results of running the MMLU evaluation on modern large language models. I have been able to find some data here https://github.com/EleutherAI/lm-evaluation-harness/tree/master/results and will be asking them if/when, they can provide any additional data.

MMLU may be the most common evaluation run on LLMs recently, but it is very rare for papers to report more than a single final number and I have not been able to find datasets for the evaluations that were run for any major recent LLM papers.

submitted by /u/corey1505
[link] [comments]

Dataset Of MMLU Results Broken Down By Task

Leave a Reply Cancel reply

Recent Posts

Recent Comments

18+ Content

Leave a Reply Cancel reply

Recent Posts

Recent Comments