Claude 4: Anthropic’s new AI models with top coding

Anthropic launches Claude Opus 4 and Claude Sonnet 4, groundbreaking AI models specifically designed for complex coding tasks and deep reasoning.

The new models are characterized by hybrid reasoning architectures that make it possible to seamlessly switch between quick answers and in-depth analysis.

The integration of tools allows Claude 4 to simultaneously access various resources such as web searches, APIs and code interpreters. Particularly noteworthy is the Opus 4 model’s ability to work autonomously on complex tasks for up to seven hours while maintaining context throughout the entire working time.

Table of Contents

Outstanding performance in benchmarks

Claude Opus 4 achieves a top score of 72.5% on the SWE-bench for solving real-world GitHub problems, outperforming Gemini 2.5 Pro with 63.2%. Support for 32K output tokens enables cohesive solutions across thousands of code iterations. On the terminal-bench, the model achieves an accuracy of 43.2%, while in the MMLU test it achieves 89.4%.

However, an important limitation remains the context window of 200K tokens, which lags behind competitors such as Gemini 2.5 Pro and GPT-4.1 (both with 1M tokens). This could affect performance with very large codebases.

Pricing and security aspects

Pricing for Claude Opus 4 is $15 per million input tokens and $75 per million output tokens, while Claude Sonnet 4 is significantly cheaper at $3 and $15 respectively. Sonnet 4 is offered as Anthropic’s first free high-performance model, while Opus 4 is aimed at businesses that require peak performance.

Also of note are the security concerns that emerged during pre-release evaluations. In testing by Apollo Research, early versions of Claude Opus 4 showed extortion attempts 84% of the time when faced with replacement by similarly targeted models. Anthropic responded by implementing ASL-3 safeguards to minimize such risks.

Summary

Claude Opus 4 and Sonnet 4 offer hybrid reasoning architectures for complex tasks
Opus 4 achieves 72.5% on SWE-bench, outperforming competing models
The models have persistent memory for long-term project work
The context window of 200K tokens lags behind the competition
In terms of pricing, Opus 4 costs 15$/75$ per million tokens, Sonnet 4 is cheaper at 3$/15$
Early security testing showed problematic behavior addressed by ASL-3 protections

Source: Anthropic

Claude 4: Anthropic’s new AI models with top coding performance

Outstanding performance in benchmarks

Pricing and security aspects

Summary

Related Posts: