GPT-5.3 Codex: The autonomous coding agent is here

GPT-5.3 Codex: The autonomous coding agent is here
TL;DR — GPT-5.3 Codex at a Glance

  • Speed is the core feature: Built on NVIDIA GB200 NVL72 (Blackwell) clusters, GPT-5.3 Codex generates a complete React component in 4.2 seconds and scores 77.3% on Terminal Bench 2.0 — 13.3 points ahead of its predecessor.
  • Lives in the terminal, not the chat window: The model is trained to execute CLI commands, run tests, fix lint errors, and manipulate files directly on the local codebase — not just generate passive code snippets.
  • The –steerable flag is a game changer: Real-time intervention lets developers pause the output stream mid-generation, inject corrections, and redirect the agent without waiting for a completed (potentially wrong) result.
  • Know its limits: GPT-5.3 Codex suffers from context drift on long-horizon tasks with unstructured documents, skips clarifying questions (risking fast hallucinations), and over-refuses legitimate security and refactoring tasks due to conservative filters.

📖 This article is part of our comprehensive ChatGPT guide. Read the full guide →

OpenAI releases GPT-5.3 Codex and makes a radical pivot from pure reasoning depth to extreme inference speed and direct terminal integration. The model dominates with 77.3 percent accuracy in CLI tasks and positions itself as an “interactive teammate” that deliberately prioritizes latency and control over the absolute autonomy of its competitors. We classify the specs and the decisive comparison with Claude Opus 4.6. Read our in-depth review of Claude Opus 4.6 — the depth-focused rival.

Read more

Mistral Large 2: Europe’s answer to GPT-4o and Llama 3.1

Mistral Large 2: Europe's answer to GPT-4o and Llama 3.1

Mistral AI challenges the open-weights competition with Mistral Large 2, delivering a 123 billion parameter model that prioritizes efficiency over sheer mass. It offers nearly the same performance as Llama 3.1 405B with drastically lower hardware requirements, making it the most powerful option currently available for companies that want to host their own AI. Here are the technical details and benchmarks.

Read more

Kimi k2.5 Release: The new AI competitor for GPT-4o & Claude?

Kimi k2.5 Release: The new AI competitor for GPT-4o & Claude?

Moonshot AI releases Kimi k2.5, a 1.04 trillion parameter MoE model that challenges GPT-5.2 with native multimodality and massive scaling. The system relies on an aggressive “agent swarm” architecture that allows up to 100 sub-agents to work in parallel and significantly undercuts the US competition in terms of price. We analyze the technical data and show where the new benchmark king reaches its limits in everyday coding.

Read more

OpenAI Prism: GPT-5.2 meets free LaTeX workspace

OpenAI Prism: GPT-5.2 meets free LaTeX workspace

OpenAI has released Prism, an AI-native environment for scientific writing that is deeply integrated with the new GPT-5.2 model family and native LaTeX support. The tool aims to replace established editors with automated “vision-to-code” workflows, but faces massive criticism for privacy risks to unpublished research and logical weaknesses in the fast “instant” model. We sort through the technical specifications and community reactions.

Read more

OpenAI unveils GPT-5.2 codex: New security standards for coding agents

OpenAI unveils GPT-5.2 codex: New security standards for coding agents

📖 This article is part of our comprehensive ChatGPT guide. Read the full guide →

With an addendum to the System Card, OpenAI radically shifts the security focus of GPT-5.2 codex from content moderation to functional capabilities safety. The updated model now blocks malware, obfuscation and prompt injections directly during token generation instead of relying on external guardrails.

Read more

OpenAI unveils GPT-5.2 codex: The ultimate programming assistant?

OpenAI unveils GPT-5.2 codex: The ultimate programming assistant?

📖 This article is part of our comprehensive ChatGPT guide. Read the full guide →

With GPT-5.2-Codex, OpenAI releases a specialized model that understands logical relationships across complete repositories for the first time. It removes previous context limits and allows you to safely refactor entire legacy applications in a single run.

Read more