QwQ-Max preview: Alibaba's AI revolution overtakes Claude 3.5 and DeepSeek V3

Alibaba is setting new standards in the AI space with QwQ-Max-Preview, challenging established models such as Claude 3.5 and GPT-4o. The new model achieves an impressive 60% success rate on the first attempt for challenging AIME 2025 math problems and outperforms both DeepSeek V3 (85.5) and Claude 3.5 Sonnet (85.2) with 89.4 points in the Arena Hard benchmark.

The model’s hybrid architecture combines an efficient mixture-of-experts approach with context processing of 32,768 tokens, which enables specialized task processing with simultaneous versatility. Of particular note is the progressive pre-training to 20 trillion tokens and reinforcement learning fine-tuning, which reduces hallucinations by 40% compared to its predecessor Qwen2.5-VL.

Table of Contents

Technical excellence in detail

The multimodal capabilities of the QwQ-Max-Preview are impressive. The system can process videos of up to one hour in length and generate SVG code from visual descriptions. It supports 29 different languages and demonstrates 92% accuracy in self-correcting coding errors during LiveCodeBench evaluations.

In a competitive comparison, QwQ-Max-Preview is on par with leading proprietary models:

Benchmark	QwQ-Max-Preview	Claude 3.5 Sonnet	DeepSeek V3
Arena-Hard	89,4	85,2	85,5
LiveCodeBench	38,7	38,9	37,6
MMLU-Pro	76,1	78,0	75,9
GPQA-Diamond	60,1	65,0	59,1

Future plans and market positioning

Alibaba’s AI infrastructure investment of 53 billion dollars positions QwQ-Max as a cornerstone for complex agent systems. The model already demonstrates an 83% success rate in multi-level supply chain optimization simulations. Enterprise-oriented tools will soon be added to the offering:

Qwen Chat App: real-time collaboration features for team-based AI workflows
QwQ-32B variant: Privacy-friendly local deployment model with 58% lower VRAM requirements
Apache 2.0 release: Full open source access planned for Q2 2025

This release intensifies global AI competition and provides an open source alternative to proprietary models while maintaining commercial performance.

Executive Summary

Alibaba’s QwQ-Max preview outperforms competing models with 89.4 points in the Arena Hard benchmark
Model is based on 20 trillion tokens of training and reduces hallucinations by 40%
Multimodal processing of one-hour videos and 29 languages is supported
92% accuracy in self-correction of programming code
Open source release under Apache 2.0 planned for Q2 2025
Alibaba’s $53 billion investment intensifies global AI competition

Source: QwenLM

QwQ-Max preview: Alibaba’s AI revolution overtakes Claude 3.5 and DeepSeek V3

Technical excellence in detail

Future plans and market positioning

Executive Summary

Related Posts: