OpenAI - Articles, News & Guides

GPT-5.3 Codex: The autonomous coding agent is here

2026-03-182026-02-06 by Florian Schröder

TL;DR — GPT-5.3 Codex at a Glance

Speed is the core feature: Built on NVIDIA GB200 NVL72 (Blackwell) clusters, GPT-5.3 Codex generates a complete React component in 4.2 seconds and scores 77.3% on Terminal Bench 2.0 — 13.3 points ahead of its predecessor.
Lives in the terminal, not the chat window: The model is trained to execute CLI commands, run tests, fix lint errors, and manipulate files directly on the local codebase — not just generate passive code snippets.
The –steerable flag is a game changer: Real-time intervention lets developers pause the output stream mid-generation, inject corrections, and redirect the agent without waiting for a completed (potentially wrong) result.
Know its limits: GPT-5.3 Codex suffers from context drift on long-horizon tasks with unstructured documents, skips clarifying questions (risking fast hallucinations), and over-refuses legitimate security and refactoring tasks due to conservative filters.

📖 This article is part of our comprehensive ChatGPT guide. Read the full guide →

OpenAI releases GPT-5.3 Codex and makes a radical pivot from pure reasoning depth to extreme inference speed and direct terminal integration. The model dominates with 77.3 percent accuracy in CLI tasks and positions itself as an “interactive teammate” that deliberately prioritizes latency and control over the absolute autonomy of its competitors. We classify the specs and the decisive comparison with Claude Opus 4.6. Read our in-depth review of Claude Opus 4.6 — the depth-focused rival.

Xcode 26.3: Agentic Coding with Claude & Codex

2026-03-182026-02-04 by Florian Schröder

With the release candidate of Xcode 26.3,Apple is opening up the IDE architecture for autonomous AI agents via Model Context Protocol (MCP) for the first time. With direct access to build servers and error consoles, models can not only suggest code, but also independently fix compilation errors in a “closed loop” and visually validate them. We analyze the technical specs surrounding macOS Tahoe and why developers are warning of potential security risks.

OpenAI releases native Codex app for macOS

2026-03-182026-02-03 by Florian Schröder

OpenAI has released a standalone Codex app for macOS that deeply integrates coding agents based on GPT-5.2 into the operating system. The tool relies on isolated Git work trees to solve complex tasks in parallel in the background without blocking the developer’s active workflow in the main editor. We analyze how this asynchronous “manager” approach compares directly to Anthropic’s CLI competition.

Mistral Large 2: Europe’s answer to GPT-4o and Llama 3.1

2026-03-182026-01-30 by Florian Schröder

Mistral AI challenges the open-weights competition with Mistral Large 2, delivering a 123 billion parameter model that prioritizes efficiency over sheer mass. It offers nearly the same performance as Llama 3.1 405B with drastically lower hardware requirements, making it the most powerful option currently available for companies that want to host their own AI. Here are the technical details and benchmarks.

Security for AI agents: How OpenAI prevents data theft via links

2026-03-182026-01-30 by Florian Schröder

OpenAI details the security architecture behind its new “Operator” agent, which executes web interactions in an isolated cloud sandbox rather than locally on user devices. By implementing cryptographic signatures according to RFC 9421, server operators and firewalls should be able to mathematically verify that a request actually originates from an authorized AI agent. We analyze whether this server-side “walled garden” approach effectively eliminates the risk of SSRF attacks compared to open systems such as Claude Computer Use.

Kimi k2.5 Release: The new AI competitor for GPT-4o & Claude?

2026-03-182026-01-28 by Florian Schröder

Moonshot AI releases Kimi k2.5, a 1.04 trillion parameter MoE model that challenges GPT-5.2 with native multimodality and massive scaling. The system relies on an aggressive “agent swarm” architecture that allows up to 100 sub-agents to work in parallel and significantly undercuts the US competition in terms of price. We analyze the technical data and show where the new benchmark king reaches its limits in everyday coding.

OpenAI Prism: GPT-5.2 meets free LaTeX workspace

2026-03-182026-01-28 by Florian Schröder

OpenAI has released Prism, an AI-native environment for scientific writing that is deeply integrated with the new GPT-5.2 model family and native LaTeX support. The tool aims to replace established editors with automated “vision-to-code” workflows, but faces massive criticism for privacy risks to unpublished research and logical weaknesses in the fast “instant” model. We sort through the technical specifications and community reactions.

ChatGPT Go unveiling: Access to GPT-5.2 Instant & global rollout

2026-03-182026-01-25 by Florian Schröder

OpenAI is releasing ChatGPT Go today, a speed-optimized version based on the new GPT-5.2 Instant model. The service is now available worldwide and offers mobile users minimal latency for text tasks and code generation.

OpenAI Codex Deep Dive: How the CLI controls autonomous agents

2026-03-182026-01-25 by Florian Schröder

With Codex CLI,OpenAI demonstrates how language models can be transformed from text generators into autonomous system agents. The architecture uses a continuous loop of execution and feedback to dynamically generate shell commands and independently correct errors in the command line.

GPT-5.2 in practical use: How Netomi Enterprise Agents scales

2026-03-182026-01-09 by Florian Schröder

Netomi outlines a blueprint architecture for enterprise agents that replaces static chatbots with autonomous workflows based on GPT-5.2. The system uses an upstream router to handle simple queries via GPT-4.1 and only escalates complex transactions to the more powerful model.