GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window…
| Rank | Model | Usage | Benchmark | Score | Model slug | Context | Summary |
|---|---|---|---|---|---|---|---|
| #1 |
GPT-5.4
OpenAI
|
57.3 | coding | 57.3 | openai/gpt-5.4 | 1.1M | GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window… |
| #2 |
Gemini 3.1 Pro Preview
Google
|
55.5 | coding | 55.5 | google/gemini-3.1-pro-preview | 1.0M | Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic relia… |
| #3 |
GPT-5.3-Codex
OpenAI
|
53.1 | coding | 53.1 | openai/gpt-5.3-codex | 400K | GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex wi… |
| #4 |
GPT-5.4 Mini
OpenAI
|
51.5 | coding | 51.5 | openai/gpt-5.4-mini | 400K | GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It suppor… |
| #5 |
Claude Sonnet 4.6
Anthropic
|
50.9 | coding | 50.9 | anthropic/claude-sonnet-4.6 | 1.0M | Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It… |
| #6 |
GPT-5.2
OpenAI
|
48.7 | coding | 48.7 | openai/gpt-5.2 | 400K | GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1… |
| #7 |
Claude Opus 4.6
Anthropic
|
48.1 | coding | 48.1 | anthropic/claude-opus-4.6 | 1.0M | Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire… |
| #8 |
Claude Opus 4.5
Anthropic
|
47.8 | coding | 47.8 | anthropic/claude-opus-4.5 | 200K | Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon c… |
| #9 |
Gemini 2.5 Pro
Google
|
46.7 | coding | 46.7 | google/gemini-2.5-pro-exp-03-25 | 1.0M | Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It emplo… |
| #10 |
Gemini 3 Pro Preview (high)
Google
|
46.5 | coding | 46.5 | google/gemini-3-pro-preview | - | Coding benchmark score. |
Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic relia…
GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex wi…
GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It suppor…
Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It…
GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1…
Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire…
Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon c…
Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It emplo…
Coding benchmark score.