I/O 2024: Gemini

Year: 2024 · ▶ Watch on YouTube

Sissie Hsiao (VP/GM, Gemini Experiences and Google Assistant)

Switch language → zh

Segments (7)

  • 00:00:12 · Introduction: The Vision for the Gemini App — Sissie Hsiao
    • Introducing the vision for the Gemini app to be the most helpful personal AI assistant by providing direct access to Google’s latest AI models.
  • 00:00:48 · Introducing Gemini Live — Sissie Hsiao
    • Announcing Gemini Live, a new conversational experience using voice that allows for natural, fluid conversation and interruptions.
  • 00:01:50 · Introducing Gems — Sissie Hsiao
    • Unveiling Gems, a feature that allows users to create customized, personal expert versions of Gemini for specific, recurring tasks.
  • 00:03:05 · AI as an Agent: Trip Planning in Gemini Advanced — Sissie Hsiao
    • Demonstrating how Gemini Advanced can act as an agent to plan a complex, personalized trip by reasoning over multiple variables and constraints.
  • 00:05:38 · Gemini 1.5 Pro and 1M Token Context Window — Sissie Hsiao
    • Announcing that Gemini 1.5 Pro with a 1 million token context window is now available to Gemini Advanced subscribers.
  • 00:06:33 · Long Context Use Cases: Thesis and Data Analysis — Sissie Hsiao
    • Showcasing practical applications of the large context window, including analyzing a full thesis and performing data analysis on multiple spreadsheets.
  • 00:08:41 · The Prompt: A Gemini Musical — Sissie Hsiao
    • A pre-recorded musical ad illustrating the ease and versatility of using Gemini for a wide range of everyday prompts.

Products Announced (7)

  • 00:01:15 · Gemini Live (New Experience)
    • In-depth voice conversation · Ability to interrupt the AI · Adapts to speech patterns
    • Coming this summer
  • 00:01:35 · Video Understanding in Gemini App (Upcoming Feature)
    • Based on Project Astra · Use camera to have Gemini see and respond to surroundings · Real-time visual conversation
    • Later this year
  • 00:02:03 · Gems (New Feature)
    • Create personalized AI experts · Customize Gemini for specific needs · Save instructions for recurring tasks
    • Rolling out in the coming months
  • 00:03:35 · Trip Planning in Gemini Advanced (New Experience)
    • Acts as an agent to plan complex tasks · Integrates with Google apps like Gmail and Maps · Creates dynamic, customizable itineraries
    • Coming this summer
  • 00:05:45 · Gemini 1.5 Pro in Gemini Advanced (Now Available)
    • 1 million token context window · Process large documents, codebases, and videos · Longest context window of any consumer chatbot
    • Available today for Gemini Advanced subscribers
  • 00:07:09 · Data Analysis in Gemini Advanced (Upcoming Feature)
    • Upload and analyze spreadsheets (e.g., from Google Sheets) · Generate custom Python code for analysis · Create visualizations and charts from data
    • Launching in the coming weeks
  • 00:07:56 · 2M Token Context Window (Upcoming Upgrade)
    • Doubling the context window for Gemini Advanced · Process even larger amounts of information · Further extends multi-modal reasoning capabilities
    • Later this year

Benchmarks Shown (1)

  • 00:05:51 · Context Window Size: 1M
    • Compared to Gemini app (32K), GPT-4 (128K), and Claude 3 (200K).

Commitments / Timelines (8)

  • 00:01:08 (This summer) — Have an in-depth conversation with Gemini using your voice (Gemini Live)
  • 00:01:35 (Later this year) — Bring speed gains and video understanding from Project Astra to the Gemini app
  • 00:02:47 (In the coming months) — Gems will roll out
  • 00:05:24 (This summer) — New trip planning experience will be rolling out to Gemini Advanced
  • 00:05:43 (Starting today) — Gemini Advanced subscribers get access to Gemini 1.5 Pro
  • 00:07:11 (In the coming weeks) — New data analysis feature launching
  • 00:07:56 (Later this year) — Doubling the long context window to 2 million tokens
  • 00:08:28 (Available today) — Expanding Gemini Advanced to over 35 supported languages

Demos (4)

  • 00:02:16 ✓ · Creating a Gem — Sissie Hsiao
    • A user creates a ‘Cliffhanger Curator’ Gem by providing instructions for it to act as a storyteller specializing in mysterious twists, using drafts from Google Drive.
  • 00:03:38 ✓ · Gemini Advanced Trip Planning — Sissie Hsiao
    • Gemini plans a Labor Day weekend trip to Miami based on family preferences and flight/hotel info from Gmail, creating a dynamic, editable itinerary.
  • 00:06:37 ✓ · Thesis Analysis with Long Context — Sissie Hsiao
    • A user uploads their entire thesis, research, and sources, and Gemini acts as a professor on the thesis committee, providing challenging questions to help the user prepare.
  • 00:07:14 ✓ · Data Analysis of Spreadsheets — Sissie Hsiao
    • A user uploads multiple sales spreadsheets from a side hustle, and Gemini analyzes the data, writes Python code, and generates a chart visualizing profit over time by product.

Notable Quotes (6)

  • 00:00:12 — Sissie Hsiao:

    Our vision for the Gemini app is to be the most helpful personal AI assistant.

  • 00:01:13 — Sissie Hsiao:

    We’re calling this new experience Live.

  • 00:02:02 — Sissie Hsiao:

    We’re calling these Gems.

  • 00:03:06 — Sissie Hsiao:

    Next, I’ll show you how Gemini is taking a step closer to being a true AI assistant by planning and taking actions for you.

  • 00:05:52 — Sissie Hsiao:

    That is the longest context window of any chatbot in the world.

  • 00:07:54 — Sissie Hsiao:

    Oh, and just one more thing. Later this year, we’ll be doubling the long context window to 2 million tokens.

Visual Signals (Beyond the Transcript)

On-Screen Text Moments (9)

  • 00:00:05 · Introducing Sissie Hsiao
    • Introduces the speaker by name and title as she walks on stage.
  • 00:01:16 · Gemini Live
    • Brands the new conversational voice feature.
  • 00:02:04 · Gems
    • Brands the new personalization feature.
  • 00:05:25 · Coming this summer
    • Provides the release timeline for the trip planning feature.
  • 00:05:45 · Gemini 1.5 Pro
    • Announces the specific model being integrated into Gemini Advanced.
  • 00:05:51 · A bar chart comparing token context windows: Gemini app (32K), GPT-4 (128K), Claude 3 (200K), Gemini
    • Visually demonstrates Google’s claimed leadership in context window size.
  • 00:05:55 · Longest context window in the world
    • A direct and bold competitive claim.
  • 00:07:59 · 2M Gemini Advanced
    • Announces a future doubling of the context window, reinforcing their focus on this capability.
  • 00:08:31 · Gemini Advanced 35+ languages
    • Highlights the global expansion and availability of the premium product.

Stage Moments (4)

  • 00:00:03 · Sissie Hsiao walks onto the stage to applause from a large, live audience.
  • 00:02:07 · The audience applauds enthusiastically at the announcement of the ‘Gems’ feature.
  • 00:05:31 · The audience applauds after the trip planning demo and announcement.
  • 00:08:49 · The presentation transitions to a pre-recorded, high-production-value musical advertisement for Gemini.

Visual Demos (7)

  • 00:00:23 · Conceptual UI of Gemini
    • A clean interface shows prompts for explaining atoms, generating an image of a cat playing a guitar, and a mobile UI with suggestions.
  • 00:00:43 · Gemini Advanced UI
    • A dark-themed UI for Gemini Advanced with prompts for generating images, writing code, and creating color palettes.
  • 00:01:35 · Live video analysis concept
    • A conceptual UI shows a phone’s camera feed of radishes being analyzed in real-time by Gemini Live.
  • 00:03:53 · Trip planning reasoning graph
    • A dynamic mind map visualizes how Gemini connects different variables (Miami, art, seafood, Gmail) and pulls information from various sources to build a plan.
  • 00:04:19 · Generated trip itinerary
    • A detailed, day-by-day itinerary is shown in a dynamic UI, complete with flight times, hotel info, restaurant suggestions, and activities.
  • 00:07:29 · Generated data analysis chart
    • A line chart titled ‘Profit Over Time by Product’ is generated, showing different colored lines for various products like bracelets, earrings, and phone cases.
  • 00:07:45 · Generated Python code
    • The underlying Python code using the pandas library, which Gemini wrote to perform the data analysis, is displayed.

Production Signals (3)

  • 00:00:00 · Live on-stage presentation with a large audience.
  • 00:00:22 · Cut to pre-recorded, animated UI demonstrations on the main screen while the speaker narrates.
  • 00:08:49 · Transition to a fully pre-recorded musical ad, indicating the end of the live segment.

Key Topics

AI Assistants · Gemini · Gemini Advanced · Long Context Window · Multimodality · Personalization · Agentic AI · Conversational AI · Data Analysis · Code Generation · Trip Planning · Generative AI · Google I/O

Takeaways

  • Google is aggressively pushing Gemini as a deeply integrated, multi-modal, and highly personalized AI assistant, moving beyond simple chat.
  • The massive 1M (and soon 2M) token context window in Gemini Advanced is positioned as a key differentiator, enabling complex, document-heavy tasks previously impossible for consumer chatbots.
  • Gemini is becoming more ‘agentic,’ capable of not just providing information but actively planning and executing multi-step tasks by integrating with other Google services.
  • Personalization is a major theme, with ‘Gems’ allowing users to tailor Gemini’s behavior for their specific, recurring needs, making it a more efficient tool.
  • Google is rapidly rolling out its most advanced models (like Gemini 1.5 Pro) to paying subscribers, signaling a clear strategy to monetize its cutting-edge AI capabilities.
  • The user experience is being refined to be more conversational and natural, with features like ‘Live’ voice chat and the ability to interrupt the AI, mimicking human interaction.
  • Data analysis and visualization are becoming accessible to non-experts, as Gemini can now ingest raw data from spreadsheets and produce insights and charts from natural language prompts.