I/O 2024: Developer
Year: 2024 · ▶ Watch on YouTube
Josh Woodward (Vice President, Google Labs)
Segments (6)
- 00:05 · Introduction: Gemini 1.5 Pro and Flash — Josh Woodward
- Introducing the updated Gemini 1.5 Pro and the brand new, faster Gemini 1.5 Flash, both available globally today.
- 01:01 · Gemini 1.5 API Features — Josh Woodward
- Announcing new API features including a 2M token context window, video frame extraction, parallel function calling, and context caching.
- 02:00 · Gemini 1.5 Pricing — Josh Woodward
- Revealing significantly lower pricing for Gemini 1.5 Pro and introducing the highly affordable Gemini 1.5 Flash.
- 02:55 · Demo: AI Studio with Gemini 1.5 Flash — Josh Woodward
- Demonstrating how to quickly process a large document and generate a summary using Gemini 1.5 Flash in Google AI Studio.
- 05:05 · Gemma Open Models Update — Josh Woodward
- Announcing PaliGemma, the first vision-language open model, and the upcoming Gemma 2 with a new 27B parameter size.
- 07:02 · Developer Story: Navarasa in India — Josh Woodward
- Showcasing how developers in India are using Gemma’s tokenization to create instruction-tuned models for 15 Indic languages.
Products Announced (7)
- 00:36 ·
Gemini 1.5 Pro(Updated)- Series of quality improvements · Natively multimodal · 1M token context window (2M on waitlist)
- Available today globally. Pricing starts at $3.50 per 1M tokens (up to 128K context).
- 00:43 ·
Gemini 1.5 Flash(New)- Optimized for speed and low latency · Natively multimodal · 1M token context window
- Available today globally. Pricing starts at $0.35 per 1M tokens (up to 128K context).
- 01:26 ·
Gemini API: Video frame extraction(New Feature)- Available today.
- 01:31 ·
Gemini API: Parallel function calling(New Feature)- Return more than one function call at a time
- Available today.
- 01:37 ·
Gemini API: Context caching(New Feature)- Send files to the model once to avoid resending · Reduces cost for long context tasks
- Ships next month.
- 05:46 ·
PaliGemma(New)- First vision-language open model from Google · Optimized for image captioning and visual Q&A
- Available now.
- 06:09 ·
Gemma 2(Coming Soon)- New 27 billion parameter size · Optimized for TPUs and next-gen GPUs · Outperforms models more than twice its size
- Available in June.
Benchmarks Shown (1)
- 06:49 ·
Gemma 2 (27B) Performance: Outperforms models 2X bigger- Compared to other models with >54B parameters.
Commitments / Timelines (5)
- 00:46 (Today) — Gemini 1.5 Pro and Gemini 1.5 Flash are available globally in over 200 countries and territories.
- 01:15 (Today) — Developers can sign up for the waitlist to try the 2 million token context window for Gemini 1.5 Pro.
- 01:50 (Next month) — Context caching feature for the Gemini API will ship.
- 05:51 (Right now) — PaliGemma, the first vision-language open model, is available.
- 06:14 (In June) — Gemma 2, the next generation of open models including a 27B parameter version, will be available.
Demos (1)
- 03:06 ✓ · AI Studio with Gemini 1.5 Flash — Josh Woodward
- The speaker showed the Google AI Studio web UI, loaded a 93,000-token HTML file of customer feedback, and used a prompt to ask Gemini 1.5 Flash to generate a briefing document summarizing the feedback. The model successfully and quickly streamed a structured response.
Notable Quotes (5)
- 00:22 — Josh Woodward:
You all, as developers, can choose the one that works best for you.
- 01:37 — Josh Woodward:
And my favorite, context caching. So you can send all of your files to the model once and not have to resend them over and over again.
- 02:23 — Josh Woodward:
And 1.5 Flash will start at 35 cents per 1 million tokens.
- 06:48 — Josh Woodward:
This quality-to-size ratio is amazing because it’ll outperform models more than twice its size.
- 08:40 — Harsh Dhand:
We need a technology that will harness AI so that everyone can use it and no one is left behind.
Visual Signals (Beyond the Transcript)
On-Screen Text Moments (11)
- 00:05 ·
Introducing Josh Woodward- Identifies the speaker by name and title.
- 00:29 ·
Gemini 1.5- Establishes the main topic of the segment.
- 00:47 ·
200+ countries and territories- Highlights the global availability of the new models.
- 01:15 ·
2M context window. Sign up for waitlist at ai.google.dev/gemini-api- Announces the massive 2M token context window and provides a call to action.
- 01:24 ·
New API features: Video frame extraction, Parallel function calling, Context caching- Lists the new developer-focused features being announced for the Gemini API.
- 02:16 ·
Gemini 1.5 Pro: $3.50 per 1M tokens up to 128K*- Announces a 50% price reduction for the flagship model on common context sizes.
- 02:23 ·
Gemini 1.5 Flash: $0.35 per 1M tokens up to 128K*- Reveals the extremely low price point for the new speed-optimized model.
- 05:07 ·
Gemma- Signals a shift in topic to Google’s family of open models.
- 05:46 ·
PaliGemma- Announces the new vision-language open model.
- 06:09 ·
Gemma 2- Announces the next generation of Gemma models.
- 06:27 ·
Gemma 2: 27B parameters- Reveals the new, larger size for the Gemma 2 model, a key developer request.
Stage Moments (4)
- 00:00 · Video opens on a wide shot of the large, outdoor amphitheater packed with an audience for Google I/O.
- 00:05 · Speaker Josh Woodward walks onto the circular center stage as his name is displayed on the main screen.
- 00:51 · The audience applauds loudly after the announcement of Gemini 1.5’s availability in 200+ countries.
- 02:27 · The audience applauds again, reacting to the low price of the new Gemini 1.5 Flash model.
Visual Demos (3)
- 03:06 · Google AI Studio UI
- A screen recording shows the AI Studio interface. A file named ‘customer-forums.html’ is loaded, showing a token count of 93,087. The model ‘Gemini 1.5 Flash’ is selected. A prompt is entered, and the model streams a structured ‘Briefing Doc’ with bullet points summarizing themes and benefits.
- 05:59 · PaliGemma capabilities montage
- A fast-paced montage of images (DNA, a dog, flowers, satellite imagery) with icons suggesting image labeling, captioning, and visual Q&A tasks.
- 07:58 · Gemma Tokenizer visualization
- An animation shows a block of Hindi text being broken down into smaller token blocks, illustrating how the tokenizer processes non-Latin scripts.
Production Signals (2)
- 03:06 · Picture-in-picture demo format, with the live speaker on the left and a pre-recorded screen capture of the AI Studio demo on the right.
- 07:02 · The presentation transitions from the live stage to a fully pre-recorded, cinematic video segment about developers in India, featuring interviews and location footage.
Key Topics
Gemini 1.5 Pro · Gemini 1.5 Flash · AI Models · Developer Tools · API Pricing · Multimodality · Long Context Window · Open Models · Gemma · PaliGemma · Gemma 2 · Google AI Studio · Vertex AI · Function Calling · AI for Developers
Takeaways
- Google is aggressively competing on price and performance with the introduction of Gemini 1.5 Flash, a very fast and inexpensive model, and a 50% price cut for Gemini 1.5 Pro.
- The focus is squarely on developers, with immediate global availability, powerful new API features like context caching, and a simple on-ramp via Google AI Studio.
- The 1M token context window is standard, with a 2M window on the horizon, positioning long-context processing as a key differentiator for the Gemini family.
- Google is doubling down on its commitment to open models by expanding the Gemma family with PaliGemma (vision-language) and the upcoming, more powerful Gemma 2 (27B).
- There is a strong emphasis on making AI accessible and useful globally, highlighted by the Gemma tokenizer’s efficiency with diverse languages, enabling projects like Navarasa for Indic languages.