Claude Opus 4.8 Beats GPT 5.5 — What Businesses Need to Know

Claude Opus 4.8 is out. And it is beating GPT 5.5 on almost all benchmarks.

Anthropic just released Opus 4.8 — same price as Opus 4.7, but stronger across coding, agentic tasks, reasoning, and professional knowledge work. Dario Amodei is back publicly, and the message is clear: the model race is not slowing down.

Key benchmark results

Super-Agent benchmark: Opus 4.8 is the only model to complete every case end-to-end, beating prior Opus models and GPT 5.5 at parity on cost.
Legal Agent Benchmark: Highest score recorded, first model to break 10% on the all-pass standard.
Browser-agent tasks (Online-Mind2Web): 84%, a meaningful jump over both Opus 4.7 and GPT 5.5.
CursorBench: Exceeds prior Opus models across every effort level with more efficient tool calling.
Fast mode: 3× cheaper than before, 2.5× faster. Token cost 61% cheaper than Opus 4.7 for the same work.

What changed

One of the most important improvements in Opus 4.8 is honesty. The model is more likely to flag uncertainties about its work and less likely to make unsupported claims. For businesses trusting AI to run unattended, that is the difference between a reliable tool and an expensive hallucination.

The model also carries context and style direction better across long sessions, fixes the comment-verbosity and tool-calling issues from Opus 4.7, and produces more information-dense analysis outputs.

Why this matters for businesses

Model switching just got cheaper. Opus 4.8 fast mode is 61% cheaper on token cost than Opus 4.7 for the same work. If you were waiting for a price signal to test Anthropic, this is it.
Browser-agent reliability jumped. If you are building automation that navigates websites or fills forms, this model is meaningfully better than GPT 5.5.
Long-running tasks are safer. Opus 4.8 flags its own uncertainty instead of confidently hallucinating progress. That is exactly what you want in a model you trust to run unattended.
The price did not go up. Same Opus price, better output. That is rare in this market.

The real takeaway

Businesses that locked into one AI model six months ago are probably overpaying or underperforming now. The gap between models is growing, not shrinking.

Review your stack. Test the new models. Keep what works, switch what doesn't.

Build With Abdallah helps businesses evaluate AI tools and build automation that actually works.

Sources

Official announcement: https://www.anthropic.com/news/claude-opus-4-8
System Card: https://www.anthropic.com/claude-opus-4-8-system-card
9to5Mac coverage: https://9to5mac.com/2026/05/28/anthropic-upgrades-claude-with-new-opus-4-8-model-heres-whats-new/