QwQ-Max preview: Alibaba’s AI revolution overtakes Claude 3.5 and DeepSeek V3

Alibaba is setting new standards in the AI space with QwQ-Max-Preview, challenging established models such as Claude 3.5 and GPT-4o. The new model achieves an impressive 60% success rate on the first attempt for challenging AIME 2025 math problems and outperforms both DeepSeek V3 (85.5) and Claude 3.5 Sonnet (85.2) with 89.4 points in the Arena Hard benchmark.

The model’s hybrid architecture combines an efficient mixture-of-experts approach with context processing of 32,768 tokens, which enables specialized task processing with simultaneous versatility. Of particular note is the progressive pre-training to 20 trillion tokens and reinforcement learning fine-tuning, which reduces hallucinations by 40% compared to its predecessor Qwen2.5-VL.

Technical excellence in detail

The multimodal capabilities of the QwQ-Max-Preview are impressive. The system can process videos of up to one hour in length and generate SVG code from visual descriptions. It supports 29 different languages and demonstrates 92% accuracy in self-correcting coding errors during LiveCodeBench evaluations.

In a competitive comparison, QwQ-Max-Preview is on par with leading proprietary models:

Benchmark QwQ-Max-Preview Claude 3.5 Sonnet DeepSeek V3
Arena-Hard 89,4 85,2 85,5
LiveCodeBench 38,7 38,9 37,6
MMLU-Pro 76,1 78,0 75,9
GPQA-Diamond 60,1 65,0 59,1

Future plans and market positioning

Alibaba’s AI infrastructure investment of 53 billion dollars positions QwQ-Max as a cornerstone for complex agent systems. The model already demonstrates an 83% success rate in multi-level supply chain optimization simulations. Enterprise-oriented tools will soon be added to the offering:

Advertisement

Ebook - ChatGPT for Work and Life - The Beginner's Guide to Getting More Done

For Beginners: Learn ChatGPT for Your Job & Life

Our latest e-book provides a simple and structured guide on how to use ChatGPT in your job or personal life.

  • Includes many examples and prompts to try out
  • 8 use cases included: e.g., as a translator, learning assistant, mortgage calculator, and more
  • 40 pages: clearly explained and focused on the essentials

Preview & Buy on Amazon
Preview & Buy on Gumroad

  • Qwen Chat App: real-time collaboration features for team-based AI workflows
  • QwQ-32B variant: Privacy-friendly local deployment model with 58% lower VRAM requirements
  • Apache 2.0 release: Full open source access planned for Q2 2025

This release intensifies global AI competition and provides an open source alternative to proprietary models while maintaining commercial performance.

Ads

Legal Notice: This website ai-rockstars.com participates in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.

Executive Summary

  • Alibaba’s QwQ-Max preview outperforms competing models with 89.4 points in the Arena Hard benchmark
  • Model is based on 20 trillion tokens of training and reduces hallucinations by 40%
  • Multimodal processing of one-hour videos and 29 languages is supported
  • 92% accuracy in self-correction of programming code
  • Open source release under Apache 2.0 planned for Q2 2025
  • Alibaba’s $53 billion investment intensifies global AI competition

Source: QwenLM