Gemini 2.5 Flash: Google’s AI with configurable thinking is changing the development landscape

Google is expanding its AI portfolio with Gemini 2.5 Flash, a groundbreaking model that enables configurable thinking for the first time. The new model introduces a “Thinking Budget” that allows developers to precisely control how much computing power is used for complex tasks.

Gemini 2.5 Flash is based on a hybrid reasoning architecture that combines the efficient processing of simple queries with the ability to perform in-depth analysis when required. Developers can choose between a fast response generation (under 500ms) and an intensive reasoning process, depending on the use case. The model supports an impressive context scope of one million tokens, enabling the processing of large documents such as scientific articles or large code bases.

Control over the reasoning capability is provided by a parameter that can be set between 0 and 24,576 tokens. At 0 tokens, the model works like its predecessor Gemini 2.0 Flash, while higher values allow multi-level calculations and complex analyses.

The pricing structure of the model reflects the hybrid approach: standard requests cost $0.15 per million input tokens and $0.60 per million output tokens. An additional $3.50 per million tokens is charged for the activated thought process. This allows developers to control costs precisely by activating the thinking process only for complex tasks.

In benchmarks, Gemini 2.5 Flash outperforms competing models such as Claude 3.5 Sonnet on STEM and programming tasks, while being approximately 40% less expensive than GPT-4 Turbo. Its multimodal capabilities allow it to process images, audio and video, opening up a wide range of applications – from automated video analysis to image-based dialog systems.

Advertisement

Ebook - ChatGPT for Work and Life - The Beginner's Guide to Getting More Done

For Beginners: Learn ChatGPT for Your Job & Life

Our latest e-book provides a simple and structured guide on how to use ChatGPT in your job or personal life.

  • Includes many examples and prompts to try out
  • 8 use cases included: e.g., as a translator, learning assistant, mortgage calculator, and more
  • 40 pages: clearly explained and focused on the essentials

View E-Book

from google import genai

client = genai.Client(api_key="GEMINI_API_KEY")

response = client.models.generate_content(
    model="gemini-2.5-flash-preview-04-17",
    contents="What is the probability of getting the sum 7 with two dice?",
    config=genai.types.GenerateContentConfig(
        thinking_config=genai.types.ThinkingConfig(
           thinking_budget=1024
        )
    )
)
print(response.text)

Ads

Legal Notice: This website ai-rockstars.com participates in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.

Summary:

  • Gemini 2.5 Flash introduces for the first time a controllable “thinking budget” that can be set between 0 and 24,576 tokens
  • The model combines fast response times (under 500ms) with the ability to perform deep analysis on demand
  • The context scope of 1 million tokens enables the processing of large documents and code bases
  • Hybrid pricing structure allows precise cost control through selective activation of reasoning capabilities
  • In benchmarks, Gemini 2.5 Flash outperforms competing models in STEM and programming tasks at lower cost
  • Multimodal processing of text, images, audio and video expands application possibilities

Source: Google Blog