MMLU-Pro
Multiple-choice knowledge test across 14 domains, with 10 answer options per question.
Top models
| # | Model | Provider | Score | Date |
|---|---|---|---|---|
| 1 | Gemini 3 Pro Preview (high) reasoning | 89.8% | 2026-04 | |
| 2 | Gemini 3 Pro Preview (low) | 89.5% | 2026-04 | |
| 3 | Claude Opus 4.5 (Reasoning) reasoning | Anthropic | 89.5% | 2026-03 |
| 4 | Qwen3.6 Plus | Alibaba | 88.5% | 2026-04 |
| 5 | MiniMax M2.1 | MiniMax | 88% | 2026-04 |
| 6 | Qwen3.5-397B-A17B | Alibaba | 87.8% | 2026-04 |
| 7 | Kimi K2.5 | Moonshot AI | 87.1% | 2026-04 |
| 8 | ERNIE 5.0 | Baidu | 87% | 2026-04 |
What does it measure?
MMLU-Pro is a 2024 revision of the well-known MMLU test. Roughly 12,000 multiple-choice questions across 14 domains: biology, law, mathematics, philosophy, medicine, engineering and more. Each question has 10 answer options (instead of 4 in MMLU) and the set was manually filtered for trivial or leaking questions.
The goal: a pure knowledge test that cannot be cracked by guessing or shallow pattern matching, requiring real reasoning.
How to read the score
The score is the percentage of questions answered correctly.
- Random guessing: 10% (ten answer options).
- Human expert baseline: ~78% with chain-of-thought.
- Current top: ~90%. The top three are within one percentage point, practically saturated.
Example task
Example (mathematics):
"Find the characteristic of the ring 2ℤ."
A. 0 · B. 30 · C. 3 · D. 10 · E. 12 · F. 50 · G. 2 · H. 100 · I. 20 · J. 5Correct answer: A (0) — there is no positive n for which n·x = 0 for all x in 2ℤ.
What to watch out for
- Saturated. Top models sit above the human expert baseline. Less useful to separate frontier models, use GPQA Diamond or HLE for the genuinely hard questions.
- Self-reported vs. independent. The figures in model-release posts are almost always run by the maker. Independent reruns (Artificial Analysis) can differ by 2-5 points.
- Skewed domain split. Mathematics and law are over-represented. An average score can mask weakness in a domain.