I/O 2024: Developer

Year: 2024 · ▶ Watch on YouTube

Josh Woodward (Vice President, Google Labs)

Switch language → zh

Segments (6)

00:05 · Introduction: Gemini 1.5 Pro and Flash — Josh Woodward
- Introducing the updated Gemini 1.5 Pro and the brand new, faster Gemini 1.5 Flash, both available globally today.
01:01 · Gemini 1.5 API Features — Josh Woodward
- Announcing new API features including a 2M token context window, video frame extraction, parallel function calling, and context caching.
02:00 · Gemini 1.5 Pricing — Josh Woodward
- Revealing significantly lower pricing for Gemini 1.5 Pro and introducing the highly affordable Gemini 1.5 Flash.
02:55 · Demo: AI Studio with Gemini 1.5 Flash — Josh Woodward
- Demonstrating how to quickly process a large document and generate a summary using Gemini 1.5 Flash in Google AI Studio.
05:05 · Gemma Open Models Update — Josh Woodward
- Announcing PaliGemma, the first vision-language open model, and the upcoming Gemma 2 with a new 27B parameter size.
07:02 · Developer Story: Navarasa in India — Josh Woodward
- Showcasing how developers in India are using Gemma’s tokenization to create instruction-tuned models for 15 Indic languages.

Products Announced (7)

00:36 · Gemini 1.5 Pro (Updated)
- Series of quality improvements · Natively multimodal · 1M token context window (2M on waitlist)
- Available today globally. Pricing starts at $3.50 per 1M tokens (up to 128K context).
00:43 · Gemini 1.5 Flash (New)
- Optimized for speed and low latency · Natively multimodal · 1M token context window
- Available today globally. Pricing starts at $0.35 per 1M tokens (up to 128K context).
01:26 · Gemini API: Video frame extraction (New Feature)
- Available today.
01:31 · Gemini API: Parallel function calling (New Feature)
- Return more than one function call at a time
- Available today.
01:37 · Gemini API: Context caching (New Feature)
- Send files to the model once to avoid resending · Reduces cost for long context tasks
- Ships next month.
05:46 · PaliGemma (New)
- First vision-language open model from Google · Optimized for image captioning and visual Q&A
- Available now.
06:09 · Gemma 2 (Coming Soon)
- New 27 billion parameter size · Optimized for TPUs and next-gen GPUs · Outperforms models more than twice its size
- Available in June.

Benchmarks Shown (1)

06:49 · Gemma 2 (27B) Performance: Outperforms models 2X bigger
- Compared to other models with >54B parameters.

Commitments / Timelines (5)

00:46 (Today) — Gemini 1.5 Pro and Gemini 1.5 Flash are available globally in over 200 countries and territories.
01:15 (Today) — Developers can sign up for the waitlist to try the 2 million token context window for Gemini 1.5 Pro.
01:50 (Next month) — Context caching feature for the Gemini API will ship.
05:51 (Right now) — PaliGemma, the first vision-language open model, is available.
06:14 (In June) — Gemma 2, the next generation of open models including a 27B parameter version, will be available.

Demos (1)

03:06 ✓ · AI Studio with Gemini 1.5 Flash — Josh Woodward
- The speaker showed the Google AI Studio web UI, loaded a 93,000-token HTML file of customer feedback, and used a prompt to ask Gemini 1.5 Flash to generate a briefing document summarizing the feedback. The model successfully and quickly streamed a structured response.

Notable Quotes (5)

00:22 — Josh Woodward:

You all, as developers, can choose the one that works best for you.
01:37 — Josh Woodward:

And my favorite, context caching. So you can send all of your files to the model once and not have to resend them over and over again.
02:23 — Josh Woodward:

And 1.5 Flash will start at 35 cents per 1 million tokens.
06:48 — Josh Woodward:

This quality-to-size ratio is amazing because it’ll outperform models more than twice its size.
08:40 — Harsh Dhand:

We need a technology that will harness AI so that everyone can use it and no one is left behind.

Visual Signals (Beyond the Transcript)

On-Screen Text Moments (11)

00:05 · Introducing Josh Woodward
- Identifies the speaker by name and title.
00:29 · Gemini 1.5
- Establishes the main topic of the segment.
00:47 · 200+ countries and territories
- Highlights the global availability of the new models.
01:15 · 2M context window. Sign up for waitlist at ai.google.dev/gemini-api
- Announces the massive 2M token context window and provides a call to action.
01:24 · New API features: Video frame extraction, Parallel function calling, Context caching
- Lists the new developer-focused features being announced for the Gemini API.
02:16 · Gemini 1.5 Pro: $3.50 per 1M tokens up to 128K*
- Announces a 50% price reduction for the flagship model on common context sizes.
02:23 · Gemini 1.5 Flash: $0.35 per 1M tokens up to 128K*
- Reveals the extremely low price point for the new speed-optimized model.
05:07 · Gemma
- Signals a shift in topic to Google’s family of open models.
05:46 · PaliGemma
- Announces the new vision-language open model.
06:09 · Gemma 2
- Announces the next generation of Gemma models.
06:27 · Gemma 2: 27B parameters
- Reveals the new, larger size for the Gemma 2 model, a key developer request.

Stage Moments (4)

00:00 · Video opens on a wide shot of the large, outdoor amphitheater packed with an audience for Google I/O.
00:05 · Speaker Josh Woodward walks onto the circular center stage as his name is displayed on the main screen.
00:51 · The audience applauds loudly after the announcement of Gemini 1.5’s availability in 200+ countries.
02:27 · The audience applauds again, reacting to the low price of the new Gemini 1.5 Flash model.

Visual Demos (3)

03:06 · Google AI Studio UI
- A screen recording shows the AI Studio interface. A file named ‘customer-forums.html’ is loaded, showing a token count of 93,087. The model ‘Gemini 1.5 Flash’ is selected. A prompt is entered, and the model streams a structured ‘Briefing Doc’ with bullet points summarizing themes and benefits.
05:59 · PaliGemma capabilities montage
- A fast-paced montage of images (DNA, a dog, flowers, satellite imagery) with icons suggesting image labeling, captioning, and visual Q&A tasks.
07:58 · Gemma Tokenizer visualization
- An animation shows a block of Hindi text being broken down into smaller token blocks, illustrating how the tokenizer processes non-Latin scripts.

Production Signals (2)

03:06 · Picture-in-picture demo format, with the live speaker on the left and a pre-recorded screen capture of the AI Studio demo on the right.
07:02 · The presentation transitions from the live stage to a fully pre-recorded, cinematic video segment about developers in India, featuring interviews and location footage.

Key Topics

Gemini 1.5 Pro · Gemini 1.5 Flash · AI Models · Developer Tools · API Pricing · Multimodality · Long Context Window · Open Models · Gemma · PaliGemma · Gemma 2 · Google AI Studio · Vertex AI · Function Calling · AI for Developers

Takeaways

Google is aggressively competing on price and performance with the introduction of Gemini 1.5 Flash, a very fast and inexpensive model, and a 50% price cut for Gemini 1.5 Pro.
The focus is squarely on developers, with immediate global availability, powerful new API features like context caching, and a simple on-ramp via Google AI Studio.
The 1M token context window is standard, with a 2M window on the horizon, positioning long-context processing as a key differentiator for the Gemini family.
Google is doubling down on its commitment to open models by expanding the Gemma family with PaliGemma (vision-language) and the upcoming, more powerful Gemma 2 (27B).
There is a strong emphasis on making AI accessible and useful globally, highlighted by the Gemma tokenizer’s efficiency with diverse languages, enabling projects like Navarasa for Indic languages.