Google is expanding its AI portfolio with Gemini 2.5 Flash, a groundbreaking model that enables configurable thinking for the first time. The new model introduces a “Thinking Budget” that allows developers to precisely control how much computing power is used for complex tasks.
Gemini 2.5 Flash is based on a hybrid reasoning architecture that combines the efficient processing of simple queries with the ability to perform in-depth analysis when required. Developers can choose between a fast response generation (under 500ms) and an intensive reasoning process, depending on the use case. The model supports an impressive context scope of one million tokens, enabling the processing of large documents such as scientific articles or large code bases.
Control over the reasoning capability is provided by a parameter that can be set between 0 and 24,576 tokens. At 0 tokens, the model works like its predecessor Gemini 2.0 Flash, while higher values allow multi-level calculations and complex analyses.
The pricing structure of the model reflects the hybrid approach: standard requests cost $0.15 per million input tokens and $0.60 per million output tokens. An additional $3.50 per million tokens is charged for the activated thought process. This allows developers to control costs precisely by activating the thinking process only for complex tasks.
In benchmarks, Gemini 2.5 Flash outperforms competing models such as Claude 3.5 Sonnet on STEM and programming tasks, while being approximately 40% less expensive than GPT-4 Turbo. Its multimodal capabilities allow it to process images, audio and video, opening up a wide range of applications – from automated video analysis to image-based dialog systems.
from google import genai
client = genai.Client(api_key="GEMINI_API_KEY")
response = client.models.generate_content(
model="gemini-2.5-flash-preview-04-17",
contents="What is the probability of getting the sum 7 with two dice?",
config=genai.types.GenerateContentConfig(
thinking_config=genai.types.ThinkingConfig(
thinking_budget=1024
)
)
)
print(response.text)
Ads
Summary:
- Gemini 2.5 Flash introduces for the first time a controllable “thinking budget” that can be set between 0 and 24,576 tokens
- The model combines fast response times (under 500ms) with the ability to perform deep analysis on demand
- The context scope of 1 million tokens enables the processing of large documents and code bases
- Hybrid pricing structure allows precise cost control through selective activation of reasoning capabilities
- In benchmarks, Gemini 2.5 Flash outperforms competing models in STEM and programming tasks at lower cost
- Multimodal processing of text, images, audio and video expands application possibilities
Source: Google Blog