Alibaba is setting new standards in the AI space with QwQ-Max-Preview, challenging established models such as Claude 3.5 and GPT-4o. The new model achieves an impressive 60% success rate on the first attempt for challenging AIME 2025 math problems and outperforms both DeepSeek V3 (85.5) and Claude 3.5 Sonnet (85.2) with 89.4 points in the Arena Hard benchmark.
The model’s hybrid architecture combines an efficient mixture-of-experts approach with context processing of 32,768 tokens, which enables specialized task processing with simultaneous versatility. Of particular note is the progressive pre-training to 20 trillion tokens and reinforcement learning fine-tuning, which reduces hallucinations by 40% compared to its predecessor Qwen2.5-VL.
Technical excellence in detail
The multimodal capabilities of the QwQ-Max-Preview are impressive. The system can process videos of up to one hour in length and generate SVG code from visual descriptions. It supports 29 different languages and demonstrates 92% accuracy in self-correcting coding errors during LiveCodeBench evaluations.
In a competitive comparison, QwQ-Max-Preview is on par with leading proprietary models:
Benchmark | QwQ-Max-Preview | Claude 3.5 Sonnet | DeepSeek V3 |
---|---|---|---|
Arena-Hard | 89,4 | 85,2 | 85,5 |
LiveCodeBench | 38,7 | 38,9 | 37,6 |
MMLU-Pro | 76,1 | 78,0 | 75,9 |
GPQA-Diamond | 60,1 | 65,0 | 59,1 |
Future plans and market positioning
Alibaba’s AI infrastructure investment of 53 billion dollars positions QwQ-Max as a cornerstone for complex agent systems. The model already demonstrates an 83% success rate in multi-level supply chain optimization simulations. Enterprise-oriented tools will soon be added to the offering:
- Qwen Chat App: real-time collaboration features for team-based AI workflows
- QwQ-32B variant: Privacy-friendly local deployment model with 58% lower VRAM requirements
- Apache 2.0 release: Full open source access planned for Q2 2025
This release intensifies global AI competition and provides an open source alternative to proprietary models while maintaining commercial performance.
Ads
Executive Summary
- Alibaba’s QwQ-Max preview outperforms competing models with 89.4 points in the Arena Hard benchmark
- Model is based on 20 trillion tokens of training and reduces hallucinations by 40%
- Multimodal processing of one-hour videos and 29 languages is supported
- 92% accuracy in self-correction of programming code
- Open source release under Apache 2.0 planned for Q2 2025
- Alibaba’s $53 billion investment intensifies global AI competition
Source: QwenLM