Top models

#	Model	Provider	Score	Date
1	Gemini 3 Pro Preview (high) reasoning	Google	91.7%	2026-04
2	Gemini 3 Flash Preview (Reasoning) reasoning	Google	90.8%	2026-04
3	DeepSeek V3.2 Speciale	DeepSeek	89.6%	2026-04
4	DeepSeek-V3.2 (Thinking) reasoning	DeepSeek	83.3%	2026-04
5	MiniMax M2	MiniMax	83%	2026-04
6	LongCat-Flash-Thinking-2601 reasoning	Meituan	82.8%	2026-04
7	Nemotron 3 Super (120B A12B)	NVIDIA	81.2%	2026-04
8	Grok 4 Fast	xAI	80%	2026-04

What does it measure?

LiveCodeBench collects new programming-contest tasks from LeetCode, AtCoder and Codeforces as soon as they go live, and uses them to test whether a model can write code for problems it cannot have seen during training (contamination-resistant). Beyond plain code generation it also tests self-correction, code execution and test-output prediction.

The dataset is live: every few months a new version (v4, v5, v6) ships with only tasks published after a certain date. That makes the benchmark inherently contamination-resistant.

How to read the score

The score is pass@1 on the latest tranche. Scores depend heavily on the chosen time window; the same benchmark can produce two different scores depending on version v5 vs v6.

Random guessing: not meaningful (free code generation).
Human baseline: top competitive programmers (red Codeforces rating) typically reach 80-95%.
Current top: around 90% on the latest tranche.

Example task

Example task (LeetCode-style, from a live tranche):

"You are given two positive integers xCorner and yCorner and a 2D array circles, where each circle is given as [x, y, r]. There is a rectangle with its bottom-left corner at (0,0) and top-right corner at (xCorner, yCorner). Determine whether a path exists from bottom-left to top-right that stays entirely inside the rectangle and does not touch or cross any circle."

Example: xCorner=3, yCorner=4, circles=[[2,1,1]] → true.

What to watch out for

The time window matters. Comparisons that do not state which tranche (v4/v5/v6) was used are misleading. A model can score 92% on v5 and 82% on v6.
Contest style. Tasks are short, puzzle-like algorithmic questions, not representative of production software where you touch files, dependencies and legacy code.
Contamination window. Over time, tasks still creep into training data. LiveCodeBench therefore has to be refreshed continuously.

Top models

What does it measure?

How to read the score

Example task

What to watch out for

Sources