I/O 2024: Developer

Year: 2024 · ▶ Watch on YouTube

Josh Woodward (Vice President, Google Labs)

Switch language → zh

Segments (6)

  • 00:05 · Introduction: Gemini 1.5 Pro and Flash — Josh Woodward
    • Introducing the updated Gemini 1.5 Pro and the brand new, faster Gemini 1.5 Flash, both available globally today.
  • 01:01 · Gemini 1.5 API Features — Josh Woodward
    • Announcing new API features including a 2M token context window, video frame extraction, parallel function calling, and context caching.
  • 02:00 · Gemini 1.5 Pricing — Josh Woodward
    • Revealing significantly lower pricing for Gemini 1.5 Pro and introducing the highly affordable Gemini 1.5 Flash.
  • 02:55 · Demo: AI Studio with Gemini 1.5 Flash — Josh Woodward
    • Demonstrating how to quickly process a large document and generate a summary using Gemini 1.5 Flash in Google AI Studio.
  • 05:05 · Gemma Open Models Update — Josh Woodward
    • Announcing PaliGemma, the first vision-language open model, and the upcoming Gemma 2 with a new 27B parameter size.
  • 07:02 · Developer Story: Navarasa in India — Josh Woodward
    • Showcasing how developers in India are using Gemma’s tokenization to create instruction-tuned models for 15 Indic languages.

Products Announced (7)

  • 00:36 · Gemini 1.5 Pro (Updated)
    • Series of quality improvements · Natively multimodal · 1M token context window (2M on waitlist)
    • Available today globally. Pricing starts at $3.50 per 1M tokens (up to 128K context).
  • 00:43 · Gemini 1.5 Flash (New)
    • Optimized for speed and low latency · Natively multimodal · 1M token context window
    • Available today globally. Pricing starts at $0.35 per 1M tokens (up to 128K context).
  • 01:26 · Gemini API: Video frame extraction (New Feature)
    • Available today.
  • 01:31 · Gemini API: Parallel function calling (New Feature)
    • Return more than one function call at a time
    • Available today.
  • 01:37 · Gemini API: Context caching (New Feature)
    • Send files to the model once to avoid resending · Reduces cost for long context tasks
    • Ships next month.
  • 05:46 · PaliGemma (New)
    • First vision-language open model from Google · Optimized for image captioning and visual Q&A
    • Available now.
  • 06:09 · Gemma 2 (Coming Soon)
    • New 27 billion parameter size · Optimized for TPUs and next-gen GPUs · Outperforms models more than twice its size
    • Available in June.

Benchmarks Shown (1)

  • 06:49 · Gemma 2 (27B) Performance: Outperforms models 2X bigger
    • Compared to other models with >54B parameters.

Commitments / Timelines (5)

  • 00:46 (Today) — Gemini 1.5 Pro and Gemini 1.5 Flash are available globally in over 200 countries and territories.
  • 01:15 (Today) — Developers can sign up for the waitlist to try the 2 million token context window for Gemini 1.5 Pro.
  • 01:50 (Next month) — Context caching feature for the Gemini API will ship.
  • 05:51 (Right now) — PaliGemma, the first vision-language open model, is available.
  • 06:14 (In June) — Gemma 2, the next generation of open models including a 27B parameter version, will be available.

Demos (1)

  • 03:06 ✓ · AI Studio with Gemini 1.5 Flash — Josh Woodward
    • The speaker showed the Google AI Studio web UI, loaded a 93,000-token HTML file of customer feedback, and used a prompt to ask Gemini 1.5 Flash to generate a briefing document summarizing the feedback. The model successfully and quickly streamed a structured response.

Notable Quotes (5)

  • 00:22 — Josh Woodward:

    You all, as developers, can choose the one that works best for you.

  • 01:37 — Josh Woodward:

    And my favorite, context caching. So you can send all of your files to the model once and not have to resend them over and over again.

  • 02:23 — Josh Woodward:

    And 1.5 Flash will start at 35 cents per 1 million tokens.

  • 06:48 — Josh Woodward:

    This quality-to-size ratio is amazing because it’ll outperform models more than twice its size.

  • 08:40 — Harsh Dhand:

    We need a technology that will harness AI so that everyone can use it and no one is left behind.

Visual Signals (Beyond the Transcript)

On-Screen Text Moments (11)

  • 00:05 · Introducing Josh Woodward
    • Identifies the speaker by name and title.
  • 00:29 · Gemini 1.5
    • Establishes the main topic of the segment.
  • 00:47 · 200+ countries and territories
    • Highlights the global availability of the new models.
  • 01:15 · 2M context window. Sign up for waitlist at ai.google.dev/gemini-api
    • Announces the massive 2M token context window and provides a call to action.
  • 01:24 · New API features: Video frame extraction, Parallel function calling, Context caching
    • Lists the new developer-focused features being announced for the Gemini API.
  • 02:16 · Gemini 1.5 Pro: $3.50 per 1M tokens up to 128K*
    • Announces a 50% price reduction for the flagship model on common context sizes.
  • 02:23 · Gemini 1.5 Flash: $0.35 per 1M tokens up to 128K*
    • Reveals the extremely low price point for the new speed-optimized model.
  • 05:07 · Gemma
    • Signals a shift in topic to Google’s family of open models.
  • 05:46 · PaliGemma
    • Announces the new vision-language open model.
  • 06:09 · Gemma 2
    • Announces the next generation of Gemma models.
  • 06:27 · Gemma 2: 27B parameters
    • Reveals the new, larger size for the Gemma 2 model, a key developer request.

Stage Moments (4)

  • 00:00 · Video opens on a wide shot of the large, outdoor amphitheater packed with an audience for Google I/O.
  • 00:05 · Speaker Josh Woodward walks onto the circular center stage as his name is displayed on the main screen.
  • 00:51 · The audience applauds loudly after the announcement of Gemini 1.5’s availability in 200+ countries.
  • 02:27 · The audience applauds again, reacting to the low price of the new Gemini 1.5 Flash model.

Visual Demos (3)

  • 03:06 · Google AI Studio UI
    • A screen recording shows the AI Studio interface. A file named ‘customer-forums.html’ is loaded, showing a token count of 93,087. The model ‘Gemini 1.5 Flash’ is selected. A prompt is entered, and the model streams a structured ‘Briefing Doc’ with bullet points summarizing themes and benefits.
  • 05:59 · PaliGemma capabilities montage
    • A fast-paced montage of images (DNA, a dog, flowers, satellite imagery) with icons suggesting image labeling, captioning, and visual Q&A tasks.
  • 07:58 · Gemma Tokenizer visualization
    • An animation shows a block of Hindi text being broken down into smaller token blocks, illustrating how the tokenizer processes non-Latin scripts.

Production Signals (2)

  • 03:06 · Picture-in-picture demo format, with the live speaker on the left and a pre-recorded screen capture of the AI Studio demo on the right.
  • 07:02 · The presentation transitions from the live stage to a fully pre-recorded, cinematic video segment about developers in India, featuring interviews and location footage.

Key Topics

Gemini 1.5 Pro · Gemini 1.5 Flash · AI Models · Developer Tools · API Pricing · Multimodality · Long Context Window · Open Models · Gemma · PaliGemma · Gemma 2 · Google AI Studio · Vertex AI · Function Calling · AI for Developers

Takeaways

  • Google is aggressively competing on price and performance with the introduction of Gemini 1.5 Flash, a very fast and inexpensive model, and a 50% price cut for Gemini 1.5 Pro.
  • The focus is squarely on developers, with immediate global availability, powerful new API features like context caching, and a simple on-ramp via Google AI Studio.
  • The 1M token context window is standard, with a 2M window on the horizon, positioning long-context processing as a key differentiator for the Gemini family.
  • Google is doubling down on its commitment to open models by expanding the Gemma family with PaliGemma (vision-language) and the upcoming, more powerful Gemma 2 (27B).
  • There is a strong emphasis on making AI accessible and useful globally, highlighted by the Gemma tokenizer’s efficiency with diverse languages, enabling projects like Navarasa for Indic languages.