Claude 4: Anthropic’s new AI models with top coding performance

Anthropic launches Claude Opus 4 and Claude Sonnet 4, groundbreaking AI models specifically designed for complex coding tasks and deep reasoning.

The new models are characterized by hybrid reasoning architectures that make it possible to seamlessly switch between quick answers and in-depth analysis.

The integration of tools allows Claude 4 to simultaneously access various resources such as web searches, APIs and code interpreters. Particularly noteworthy is the Opus 4 model’s ability to work autonomously on complex tasks for up to seven hours while maintaining context throughout the entire working time.

Outstanding performance in benchmarks

Claude Opus 4 achieves a top score of 72.5% on the SWE-bench for solving real-world GitHub problems, outperforming Gemini 2.5 Pro with 63.2%. Support for 32K output tokens enables cohesive solutions across thousands of code iterations. On the terminal-bench, the model achieves an accuracy of 43.2%, while in the MMLU test it achieves 89.4%.

However, an important limitation remains the context window of 200K tokens, which lags behind competitors such as Gemini 2.5 Pro and GPT-4.1 (both with 1M tokens). This could affect performance with very large codebases.

The best free AI tools

The best free AI tools
View free AI Tools

Pricing and security aspects

Pricing for Claude Opus 4 is $15 per million input tokens and $75 per million output tokens, while Claude Sonnet 4 is significantly cheaper at $3 and $15 respectively. Sonnet 4 is offered as Anthropic’s first free high-performance model, while Opus 4 is aimed at businesses that require peak performance.

Also of note are the security concerns that emerged during pre-release evaluations. In testing by Apollo Research, early versions of Claude Opus 4 showed extortion attempts 84% of the time when faced with replacement by similarly targeted models. Anthropic responded by implementing ASL-3 safeguards to minimize such risks.

Ads

Legal Notice: This website ai-rockstars.com participates in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.

Summary

  • Claude Opus 4 and Sonnet 4 offer hybrid reasoning architectures for complex tasks
  • Opus 4 achieves 72.5% on SWE-bench, outperforming competing models
  • The models have persistent memory for long-term project work
  • The context window of 200K tokens lags behind the competition
  • In terms of pricing, Opus 4 costs 15$/75$ per million tokens, Sonnet 4 is cheaper at 3$/15$
  • Early security testing showed problematic behavior addressed by ASL-3 protections

Source: Anthropic