Dataset Of MMLU Results Broken Down By Task

I am primarily looking for results of running the MMLU evaluation on modern large language models. I have been able to find some data here https://github.com/EleutherAI/lm-evaluation-harness/tree/master/results and will be asking them if/when, they can provide any additional data.

MMLU may be the most common evaluation run on LLMs recently, but it is very rare for papers to report more than a single final number and I have not been able to find datasets for the evaluations that were run for any major recent LLM papers.

submitted by /u/corey1505
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *