Tutorial: Transcribing YouTube videos with Google AI Studio

With Google AI Studio, you can transcribe and summarize YouTube videos quickly and for free – thanks to the latest Gemini models from Google. Whether for SEO, content reuse or accessibility, we’ll show you the best prompts for accurate transcriptions and summaries.

Transcribing YouTube videos with Google AI Studio

Google AI Studio is a free tool from Google that allows you to quickly try out Google’s Gemini model using a prompt with lots of examples. You can also get an API key here that allows you to use Google Gemini in other tools or via code.

Advantages: Google AI Studio

  • Free of charge
  • Easy to use
  • Helps with AI coding and prompt tests
  • Latest Gemini models can be used
  • AI parameters such as temperature can be adjusted
  • Reasonably easy to set up (Google Account, Google Cloud activation)
Google AI Studio: Kostenloses Developer-Tool um Gemini auszuprobieren oder per API einzubinden
Google AI Studio: Free developer tool to try out Gemini or integrate it via API

 

How many videos can you transcribe for free with Google AI Studio?

In Google AI Studio you have over 1 million tokens per day to try out Gemini prompts (see screenshot on the right). A simple 2-3 minute video consumes approx. 30,000 tokens (see under Video in the screenshot). So you can transcribe and summarize 20-30 videos per day for free.

Good prompts for transcribing and summarizing videos

Precise prompting of the generative AI helps to achieve the desired result. Here are some prompts and sample results for transcribing and summarizing videos.

Advertisement

Ebook - ChatGPT for Work and Life - The Beginner's Guide to Getting More Done

For Beginners: Learn ChatGPT for Your Job & Life

Our latest e-book provides a simple and structured guide on how to use ChatGPT in your job or personal life.

  • Includes many examples and prompts to try out
  • 8 use cases included: e.g., as a translator, learning assistant, mortgage calculator, and more
  • 40 pages: clearly explained and focused on the essentials

View E-Book

  • Prompt: “Transcribe this video”: simple, but usually too rough
  • Prompt: “Transcribe this video word by word”: best simple prompt
  • Prompt: “Summarize this video by scene”: good for scene summaries
  • Prompt: “Whats happening in the video”: good for full video summaries

Tip: Specify prompts exactly

If the AIs do not directly produce the desired result, you can simply specify exactly what you need. E.g. “use timestamps”, “only scenes with German language” for multilingual videos, or “separated by person” if you want to separate transcripts by person.

Examples of video transcriptions by Google Gemini in Google AI Studio

Using the example of 2 YouTube videos, we show here how important it is to create good prompts in order to achieve the desired result.

Simple summary – prompt “Transcribe this video”

The simple prompt “Transcribe this video” leads to different results because it is rather imprecise. Sometimes timestamps are output, sometimes not. It tends to lead to a summary of scenes or the entire video, as the examples show.

Case 1: Only areas are output, but with timestamps

Youtube-Transkript mit Timestamps
Youtube transcript with timestamps

Case 2: A summary is created instead of a transcription

Youtube-Video-Transkription mit Gemini in Google AI Studio.
Youtube video transcription with Gemini in Google AI Studio

Create good transcriptions – “Transcribe this video word by word” prompt

The more precise “Transcribe this video word by word” prompt generates the desired transcription. These texts can easily be further processed, summarized, translated or reformulated into articles using generative AI.

Youtube-Transkript mit Prompt
Youtube transcript with prompt “word by word”

Summaries – prompt “Summarize this video by scene”

With the “Summarize this video” or “Summarize this video by scene” prompt, you can directly create useful summaries of the video scenes. This helps, for example, with accessibility summaries or to convert complex videos into simple/easy language according to AAA accessibility standards.

Zusammenfassen eines Video-Transkripts mit Prompt
Summarize a video transcript with prompt “summarize”

Summaries – prompt “Whats happening in the video?”

If you want to summarize an entire video, you can easily do this with the “Whats happening in the video?” prompt. This also helps with many other content reuse options such as video summaries on websites for better content findability in search engines and AIs.

Zusammenfassen eines Youtube-Transkripts
Summarizing a Youtube transcript

For developers: Google AI Studio and YouTube transcripts by code

Here are some helpful sources for developers who want to use Google Gemini via code. You can use YouTube transcripts directly in self-created tools or automation frameworks such as n8n, e.g. to use them in WordPress plugins. Transcription via API incurs costs that depend on the AI model used and the number of tokens. However, the prices for Gemini are pleasingly low.

Google Cloud: Use Gemini to summarize YouTube videos (article)

Google – Gemini API: Pricing

Tina Huang: Google AI Studio in 26 Minutes (Youtube)

Ads

Legal Notice: This website ai-rockstars.com participates in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.

Conclusion: How transcribing YouTube videos with AI helps

Google AI Studio is a new, simple solution for transcribing and summarizing videos quickly and free of charge. As with any generative AI, the same applies to Google Gemini: the more accurate the prompt, the better the result.

Transcribing YouTube videos offers numerous benefits and opportunities in many digital use cases, including better accessibility, SEO optimization, greater reach, easier content reuse and faster information absorption.

  • Better SEO and findability in AI searches
    Search engines and AIs cannot directly analyze the spoken content of a video. An additional transcript in the article makes the content searchable, improves the ranking in Google, YouTube and AI searches and ensures more organic reach.

  • Increased reach through translation and subtitles
    Transcripts can be easily translated into different languages, increasing international visibility and accessibility for non-native viewers.

  • Easier content reuse
    The written version of a video can serve as the basis for blog articles, social media posts, e-books or newsletters without the need to rephrase the content.

  • Faster absorption of information
    Many people prefer to skim through content or search for specific information. With a transcript, they can quickly find relevant passages instead of watching the whole video.

  • Better comprehensibility and note-taking
    Technical terms, complicated facts or passages spoken quickly can be understood more easily with a transcript and used for your own notes.

  • Improved accessibility
    People with hearing impairments or people who prefer to read content rather than watch it as a video benefit from a written transcript. This makes it easier to follow the content on the train or in noisy environments.