I/O 2024: Gemini
Year: 2024 · ▶ Watch on YouTube
Sissie Hsiao (VP/GM, Gemini Experiences and Google Assistant)
Segments (7)
- 00:00:12 · Introduction: The Vision for the Gemini App — Sissie Hsiao
- Introducing the vision for the Gemini app to be the most helpful personal AI assistant by providing direct access to Google’s latest AI models.
- 00:00:48 · Introducing Gemini Live — Sissie Hsiao
- Announcing Gemini Live, a new conversational experience using voice that allows for natural, fluid conversation and interruptions.
- 00:01:50 · Introducing Gems — Sissie Hsiao
- Unveiling Gems, a feature that allows users to create customized, personal expert versions of Gemini for specific, recurring tasks.
- 00:03:05 · AI as an Agent: Trip Planning in Gemini Advanced — Sissie Hsiao
- Demonstrating how Gemini Advanced can act as an agent to plan a complex, personalized trip by reasoning over multiple variables and constraints.
- 00:05:38 · Gemini 1.5 Pro and 1M Token Context Window — Sissie Hsiao
- Announcing that Gemini 1.5 Pro with a 1 million token context window is now available to Gemini Advanced subscribers.
- 00:06:33 · Long Context Use Cases: Thesis and Data Analysis — Sissie Hsiao
- Showcasing practical applications of the large context window, including analyzing a full thesis and performing data analysis on multiple spreadsheets.
- 00:08:41 · The Prompt: A Gemini Musical — Sissie Hsiao
- A pre-recorded musical ad illustrating the ease and versatility of using Gemini for a wide range of everyday prompts.
Products Announced (7)
- 00:01:15 ·
Gemini Live(New Experience)- In-depth voice conversation · Ability to interrupt the AI · Adapts to speech patterns
- Coming this summer
- 00:01:35 ·
Video Understanding in Gemini App(Upcoming Feature)- Based on Project Astra · Use camera to have Gemini see and respond to surroundings · Real-time visual conversation
- Later this year
- 00:02:03 ·
Gems(New Feature)- Create personalized AI experts · Customize Gemini for specific needs · Save instructions for recurring tasks
- Rolling out in the coming months
- 00:03:35 ·
Trip Planning in Gemini Advanced(New Experience)- Acts as an agent to plan complex tasks · Integrates with Google apps like Gmail and Maps · Creates dynamic, customizable itineraries
- Coming this summer
- 00:05:45 ·
Gemini 1.5 Pro in Gemini Advanced(Now Available)- 1 million token context window · Process large documents, codebases, and videos · Longest context window of any consumer chatbot
- Available today for Gemini Advanced subscribers
- 00:07:09 ·
Data Analysis in Gemini Advanced(Upcoming Feature)- Upload and analyze spreadsheets (e.g., from Google Sheets) · Generate custom Python code for analysis · Create visualizations and charts from data
- Launching in the coming weeks
- 00:07:56 ·
2M Token Context Window(Upcoming Upgrade)- Doubling the context window for Gemini Advanced · Process even larger amounts of information · Further extends multi-modal reasoning capabilities
- Later this year
Benchmarks Shown (1)
- 00:05:51 ·
Context Window Size: 1M- Compared to Gemini app (32K), GPT-4 (128K), and Claude 3 (200K).
Commitments / Timelines (8)
- 00:01:08 (This summer) — Have an in-depth conversation with Gemini using your voice (Gemini Live)
- 00:01:35 (Later this year) — Bring speed gains and video understanding from Project Astra to the Gemini app
- 00:02:47 (In the coming months) — Gems will roll out
- 00:05:24 (This summer) — New trip planning experience will be rolling out to Gemini Advanced
- 00:05:43 (Starting today) — Gemini Advanced subscribers get access to Gemini 1.5 Pro
- 00:07:11 (In the coming weeks) — New data analysis feature launching
- 00:07:56 (Later this year) — Doubling the long context window to 2 million tokens
- 00:08:28 (Available today) — Expanding Gemini Advanced to over 35 supported languages
Demos (4)
- 00:02:16 ✓ · Creating a Gem — Sissie Hsiao
- A user creates a ‘Cliffhanger Curator’ Gem by providing instructions for it to act as a storyteller specializing in mysterious twists, using drafts from Google Drive.
- 00:03:38 ✓ · Gemini Advanced Trip Planning — Sissie Hsiao
- Gemini plans a Labor Day weekend trip to Miami based on family preferences and flight/hotel info from Gmail, creating a dynamic, editable itinerary.
- 00:06:37 ✓ · Thesis Analysis with Long Context — Sissie Hsiao
- A user uploads their entire thesis, research, and sources, and Gemini acts as a professor on the thesis committee, providing challenging questions to help the user prepare.
- 00:07:14 ✓ · Data Analysis of Spreadsheets — Sissie Hsiao
- A user uploads multiple sales spreadsheets from a side hustle, and Gemini analyzes the data, writes Python code, and generates a chart visualizing profit over time by product.
Notable Quotes (6)
- 00:00:12 — Sissie Hsiao:
Our vision for the Gemini app is to be the most helpful personal AI assistant.
- 00:01:13 — Sissie Hsiao:
We’re calling this new experience Live.
- 00:02:02 — Sissie Hsiao:
We’re calling these Gems.
- 00:03:06 — Sissie Hsiao:
Next, I’ll show you how Gemini is taking a step closer to being a true AI assistant by planning and taking actions for you.
- 00:05:52 — Sissie Hsiao:
That is the longest context window of any chatbot in the world.
- 00:07:54 — Sissie Hsiao:
Oh, and just one more thing. Later this year, we’ll be doubling the long context window to 2 million tokens.
Visual Signals (Beyond the Transcript)
On-Screen Text Moments (9)
- 00:00:05 ·
Introducing Sissie Hsiao- Introduces the speaker by name and title as she walks on stage.
- 00:01:16 ·
Gemini Live- Brands the new conversational voice feature.
- 00:02:04 ·
Gems- Brands the new personalization feature.
- 00:05:25 ·
Coming this summer- Provides the release timeline for the trip planning feature.
- 00:05:45 ·
Gemini 1.5 Pro- Announces the specific model being integrated into Gemini Advanced.
- 00:05:51 ·
A bar chart comparing token context windows: Gemini app (32K), GPT-4 (128K), Claude 3 (200K), Gemini- Visually demonstrates Google’s claimed leadership in context window size.
- 00:05:55 ·
Longest context window in the world- A direct and bold competitive claim.
- 00:07:59 ·
2M Gemini Advanced- Announces a future doubling of the context window, reinforcing their focus on this capability.
- 00:08:31 ·
Gemini Advanced 35+ languages- Highlights the global expansion and availability of the premium product.
Stage Moments (4)
- 00:00:03 · Sissie Hsiao walks onto the stage to applause from a large, live audience.
- 00:02:07 · The audience applauds enthusiastically at the announcement of the ‘Gems’ feature.
- 00:05:31 · The audience applauds after the trip planning demo and announcement.
- 00:08:49 · The presentation transitions to a pre-recorded, high-production-value musical advertisement for Gemini.
Visual Demos (7)
- 00:00:23 · Conceptual UI of Gemini
- A clean interface shows prompts for explaining atoms, generating an image of a cat playing a guitar, and a mobile UI with suggestions.
- 00:00:43 · Gemini Advanced UI
- A dark-themed UI for Gemini Advanced with prompts for generating images, writing code, and creating color palettes.
- 00:01:35 · Live video analysis concept
- A conceptual UI shows a phone’s camera feed of radishes being analyzed in real-time by Gemini Live.
- 00:03:53 · Trip planning reasoning graph
- A dynamic mind map visualizes how Gemini connects different variables (Miami, art, seafood, Gmail) and pulls information from various sources to build a plan.
- 00:04:19 · Generated trip itinerary
- A detailed, day-by-day itinerary is shown in a dynamic UI, complete with flight times, hotel info, restaurant suggestions, and activities.
- 00:07:29 · Generated data analysis chart
- A line chart titled ‘Profit Over Time by Product’ is generated, showing different colored lines for various products like bracelets, earrings, and phone cases.
- 00:07:45 · Generated Python code
- The underlying Python code using the pandas library, which Gemini wrote to perform the data analysis, is displayed.
Production Signals (3)
- 00:00:00 · Live on-stage presentation with a large audience.
- 00:00:22 · Cut to pre-recorded, animated UI demonstrations on the main screen while the speaker narrates.
- 00:08:49 · Transition to a fully pre-recorded musical ad, indicating the end of the live segment.
Key Topics
AI Assistants · Gemini · Gemini Advanced · Long Context Window · Multimodality · Personalization · Agentic AI · Conversational AI · Data Analysis · Code Generation · Trip Planning · Generative AI · Google I/O
Takeaways
- Google is aggressively pushing Gemini as a deeply integrated, multi-modal, and highly personalized AI assistant, moving beyond simple chat.
- The massive 1M (and soon 2M) token context window in Gemini Advanced is positioned as a key differentiator, enabling complex, document-heavy tasks previously impossible for consumer chatbots.
- Gemini is becoming more ‘agentic,’ capable of not just providing information but actively planning and executing multi-step tasks by integrating with other Google services.
- Personalization is a major theme, with ‘Gems’ allowing users to tailor Gemini’s behavior for their specific, recurring needs, making it a more efficient tool.
- Google is rapidly rolling out its most advanced models (like Gemini 1.5 Pro) to paying subscribers, signaling a clear strategy to monetize its cutting-edge AI capabilities.
- The user experience is being refined to be more conversational and natural, with features like ‘Live’ voice chat and the ability to interrupt the AI, mimicking human interaction.
- Data analysis and visualization are becoming accessible to non-experts, as Gemini can now ingest raw data from spreadsheets and produce insights and charts from natural language prompts.