Alibaba presents Qwen3-Coder-480B-A35B-Instruct, an AI model that redefines the standards for autonomous software development and outperforms proprietary systems such as GPT-4.1 and Claude Sonnet-4 in key areas.
Released on July 22, 2025, the model uses a Mixture-of-Experts architecture with 480 billion parameters, but only activates 35 billion parameters per inference run. This efficiency enables high-quality code generation with significantly reduced computing resources. Native support for 256,000 tokens, expandable to one million tokens through YaRN optimization, allows the analysis of complete code repositories in a single processing step.
The training was performed with 7.5 trillion tokens, with 70 percent of the data coming from code sources. Of particular note is the Agent RL framework, which used over 20,000 parallel environments to simulate realistic development scenarios. This methodology enables the model to autonomously process GitHub issues, including code modification, testing and documentation updates without human intervention.
Benchmark dominance in critical areas
Qwen3-Coder achieves an accuracy of 61.8 percent on SWE-Bench Verified, significantly outperforming GPT-4.1 (38.8 percent) and coming close to Claude Sonnet-4 (67.0 percent). This benchmark tests the ability to solve real-world GitHub issues by analyzing code, implementing fixes and validating solutions. In CodeForce’s ELO ratings for algorithmic programming, the model sets new standards among open source systems.
The AIME evaluation (Agent Integration and Multitask Evaluation) shows Qwen3-Coder’s superiority in tool-integrated workflows: It outperforms GPT-4.1 by 8.2 percentage points in tasks that combine web browsing, API usage and debugging. On the Aider Polygot benchmark, it achieves 61.8 percent accuracy in multilingual projects – only 1.3 percentage points below Claude Sonnet-4 despite a significantly lower number of parameters.
Practical application through agent-based workflows
The model goes beyond conventional code completion and executes autonomous development workflows. The Qwen Code command line interface, adapted from Gemini Code, orchestrates development tools such as Git, Docker and test frameworks through natural language commands. Developers can formulate goals such as “refactor authentication module with OAuth 2.0 support”, whereupon the system coordinates tool execution and code implementation.
The model’s iterative refinement protocols analyze error logs, adjust implementations, and rerun tests until functional specifications are achieved. This capability proves transformative for legacy system modernization, where it identifies technical debt and recommends refactoring strategies that improve maintainability without compromising functionality.
Key facts about the update
- Architecture: 480 billion parameter mixture-of-experts model with 35 billion active parameters per inference
- Context processing: Native 256K token support, expandable to 1 million tokens through YaRN optimization
- Benchmark performance: 61.8% accuracy on SWE-Bench Verified, outperforms GPT-4.1 by 23 percentage points
- Open source availability: Apache 2.0 license enables commercial use without restrictive fees
- Tool integration: Qwen Code CLI orchestrates Git, Docker, test frameworks through natural language commands
- Quantization: GGUF format enables 4-bit execution on consumer hardware with 98.7% original accuracy
- Multilingual support: Comprehensive support for Python, JavaScript, Java, C , Go, Rust and other languages
- Agentic capabilities: Autonomous GitHub issue editing with code modification, testing and documentation
- Training innovation: Agent RL framework with 20,000 parallel environments for realistic development scenarios
- Community ecosystem: Active GitHub repositories with 119 merged pull requests and continuous development
Source: GitHub