LLM Leaderboard

The state of the frontier, measured.

An independent ranking of the strongest large language models in the world — intelligence, coding, and the specialists. We track it so we always deploy the right model for your problem, not the most hyped one.

531
Models analysed
52
Providers ranked
59.9
Top intelligence score
17.1
Average across all models

Category champions

The model to beat, task by task
Smartest
59.9
Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)
Anthropic
Coder
62
Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)
Anthropic
Math
99
GPT-5.2 (xhigh)
OpenAI
Reasoning
94.1
Gemini 3.1 Pro Preview
Google
Instruction Following
83.3
Grok 4.3 (medium)
xAI

Top 10 by intelligence

Ranked across all models
#ModelProviderIntelligenceCodingGrade
01
Claude Fable 5
Adaptive Reasoning · Max Effort · Opus 4.8 Fallback
Anthropic
59.9
62.0
A+
02
Claude Opus 4.8
Adaptive Reasoning · Max Effort
Anthropic
55.7
56.7
A-
03
GPT-5.5
xhigh
OpenAI
54.8
59.1
A-
04
Claude Opus 4.7
Adaptive Reasoning · Max Effort
Anthropic
53.5
52.5
B+
05
GPT-5.5
high
OpenAI
53.1
58.5
B+
06
GPT-5.4
xhigh
OpenAI
51.4
57.2
B
07
Gemini 3.5 Flash
high
Google
50.2
45.0
B
08
Claude Sonnet 4.6
Adaptive Reasoning · Max Effort
Anthropic
47.2
50.9
B-
09
GPT-5.5
medium
OpenAI
47.1
56.2
B-
10
Gemini 3.1 Pro Preview
Google
46.5
55.5
C+

The shape of the race

How tight the frontier really is

Top 10 by intelligence

Composite score · higher is better

Intelligence vs coding

Top 5

Provider share

of the top 10

Worth a closer look

Beyond the headline ranking
How this is measured

Frequently asked questions

Choosing the right LLM

Which is the best LLM right now?

There is no single best LLM for everything. This leaderboard ranks the top large language models by a composite intelligence score and by coding ability. The strongest models overall come from Anthropic (Claude), OpenAI (GPT) and Google (Gemini) — but the right choice depends on your task, budget, latency and data-security needs.

What is the best LLM for coding?

We score coding ability separately from general intelligence because the rankings differ. The Coding column on the leaderboard and the category champions section above show the current best model for code generation and debugging.

Is GPT or Claude better?

It depends on the task. Compare both directly on the leaderboard using their intelligence and coding scores. Claude models often lead on long-context reasoning and coding, while GPT models are strong all-rounders. The most reliable way to decide is a short proof-of-concept on your own data.

What is the best open-source LLM?

Open-weight models such as Meta's Llama and Alibaba's Qwen appear on the leaderboard alongside closed models, so you can compare the best open-source option against commercial APIs on the same intelligence and coding scales.

How often is the leaderboard updated?

The leaderboard is refreshed regularly from an independent third-party evaluation as new models are released and re-tested, so it reflects the current state of the AI frontier.

Begin

The best model is the one that fits your problem.

A leaderboard tells you what's strong in general. The harder question is which model — open or closed, large or small — is right for your data, your latency budget, and your security needs. That's the conversation our consultant is built for.