GPT-5.3 Codex: The autonomous coding agent is here

GPT-5.3 Codex: The autonomous coding agent is here
TL;DR — GPT-5.3 Codex at a Glance

  • Speed is the core feature: Built on NVIDIA GB200 NVL72 (Blackwell) clusters, GPT-5.3 Codex generates a complete React component in 4.2 seconds and scores 77.3% on Terminal Bench 2.0 — 13.3 points ahead of its predecessor.
  • Lives in the terminal, not the chat window: The model is trained to execute CLI commands, run tests, fix lint errors, and manipulate files directly on the local codebase — not just generate passive code snippets.
  • The –steerable flag is a game changer: Real-time intervention lets developers pause the output stream mid-generation, inject corrections, and redirect the agent without waiting for a completed (potentially wrong) result.
  • Know its limits: GPT-5.3 Codex suffers from context drift on long-horizon tasks with unstructured documents, skips clarifying questions (risking fast hallucinations), and over-refuses legitimate security and refactoring tasks due to conservative filters.

📖 This article is part of our comprehensive ChatGPT guide. Read the full guide →

OpenAI releases GPT-5.3 Codex and makes a radical pivot from pure reasoning depth to extreme inference speed and direct terminal integration. The model dominates with 77.3 percent accuracy in CLI tasks and positions itself as an “interactive teammate” that deliberately prioritizes latency and control over the absolute autonomy of its competitors. We classify the specs and the decisive comparison with Claude Opus 4.6. Read our in-depth review of Claude Opus 4.6 — the depth-focused rival.

Read more

Xcode 26.3: Agentic Coding with Claude & Codex

Xcode 26.3: Agentic Coding with Claude & Codex

With the release candidate of Xcode 26.3,Apple is opening up the IDE architecture for autonomous AI agents via Model Context Protocol (MCP) for the first time. With direct access to build servers and error consoles, models can not only suggest code, but also independently fix compilation errors in a “closed loop” and visually validate them. We analyze the technical specs surrounding macOS Tahoe and why developers are warning of potential security risks.

Read more

OpenAI releases native Codex app for macOS

OpenAI releases native Codex app for macOS

OpenAI has released a standalone Codex app for macOS that deeply integrates coding agents based on GPT-5.2 into the operating system. The tool relies on isolated Git work trees to solve complex tasks in parallel in the background without blocking the developer’s active workflow in the main editor. We analyze how this asynchronous “manager” approach compares directly to Anthropic’s CLI competition.

Read more

Mistral Large 2: Europe’s answer to GPT-4o and Llama 3.1

Mistral Large 2: Europe's answer to GPT-4o and Llama 3.1

Mistral AI challenges the open-weights competition with Mistral Large 2, delivering a 123 billion parameter model that prioritizes efficiency over sheer mass. It offers nearly the same performance as Llama 3.1 405B with drastically lower hardware requirements, making it the most powerful option currently available for companies that want to host their own AI. Here are the technical details and benchmarks.

Read more

Security for AI agents: How OpenAI prevents data theft via links

Security for AI agents: How OpenAI prevents data theft via links

OpenAI details the security architecture behind its new “Operator” agent, which executes web interactions in an isolated cloud sandbox rather than locally on user devices. By implementing cryptographic signatures according to RFC 9421, server operators and firewalls should be able to mathematically verify that a request actually originates from an authorized AI agent. We analyze whether this server-side “walled garden” approach effectively eliminates the risk of SSRF attacks compared to open systems such as Claude Computer Use.

Read more

Kimi k2.5 Release: The new AI competitor for GPT-4o & Claude?

Kimi k2.5 Release: The new AI competitor for GPT-4o & Claude?

Moonshot AI releases Kimi k2.5, a 1.04 trillion parameter MoE model that challenges GPT-5.2 with native multimodality and massive scaling. The system relies on an aggressive “agent swarm” architecture that allows up to 100 sub-agents to work in parallel and significantly undercuts the US competition in terms of price. We analyze the technical data and show where the new benchmark king reaches its limits in everyday coding.

Read more

OpenAI Prism: GPT-5.2 meets free LaTeX workspace

OpenAI Prism: GPT-5.2 meets free LaTeX workspace

OpenAI has released Prism, an AI-native environment for scientific writing that is deeply integrated with the new GPT-5.2 model family and native LaTeX support. The tool aims to replace established editors with automated “vision-to-code” workflows, but faces massive criticism for privacy risks to unpublished research and logical weaknesses in the fast “instant” model. We sort through the technical specifications and community reactions.

Read more