Google Keynote (I/O ‘24) — full

Year: 2024 · ▶ Watch on YouTube

Sundar Pichai (CEO) · Liz Reid (VP, Search) · Rose Yao (VP of Product, Google Search) · Aparna Pappu (VP & GM, Google Workspace) · Sissie Hsiao (VP & GM, Gemini Experiences and Google Assistant) · Sameer Samat (President, Android Ecosystem) · Dave Burke (VP, Engineering, Android) · Josh Woodward (VP, Google Labs) · Demis Hassabis (CEO) · Doug Eck (Senior Research Director) · James Manyika (SVP, Research, Technology & Society) · Tony Vincent (Director of Product Management)

Switch language → zh

Segments (17)

00:00:00 · The Year in AI & The Gemini Era — Sundar Pichai
- Sundar Pichai opens the keynote by recapping the rapid progress in AI over the past year and introduces the ‘Gemini Era’ as the central theme for Google’s AI-first approach.
00:05:59 · Gemini in Google Photos — Sundar Pichai
- A demonstration of the new ‘Ask Photos’ feature, powered by Gemini, which allows users to ask natural language questions about their photo library.
00:07:00 · Ask Photos Demo — Sundar Pichai
- Sundar Pichai shows how Ask Photos can find a license plate number and track a child’s swimming progress by analyzing years of photos.
01:00:11 · AI Teammate in Google Workspace — Aparna Pappu
- Aparna Pappu introduces a virtual AI teammate named ‘Chip’ that can be added to Google Chat to monitor projects, provide context, and facilitate collaboration.
01:08:25 · Gemini App and Live Experience — Sissie Hsiao
- Sissie Hsiao details the vision for the Gemini app as a personal AI assistant, introducing Gemini Live for conversational voice interactions and Gems for creating personalized experts.
01:18:25 · Android and AI at the Core — Sameer Samat
- Sameer Samat explains how Android is being reimagined with AI at its core, highlighting Circle to Search and the integration of on-device Gemini Nano.
01:22:17 · Gemini on Android and Context Awareness — Dave Burke
- Dave Burke demonstrates how Gemini on Android will become context-aware, providing helpful suggestions by understanding the content on the screen, such as analyzing a YouTube video or a PDF.
01:30:52 · Gemini 1.5 and Developer Tools — Josh Woodward
- Josh Woodward announces updates to the Gemini 1.5 family (Pro and Flash), new API features like context caching, and introduces the Gemma open model family, including the new PaliGemma.
01:40:11 · Responsible AI and Safety — James Manyika
- James Manyika discusses Google’s approach to responsible AI, including AI-assisted Red Teaming and the expansion of SynthID watermarking to text and video.
01:44:43 · AI for Social Good and Learning — James Manyika
- Manyika highlights how AI is being used for societal benefit, from scientific research with AlphaFold to education, and introduces LearnLM, a new family of models for learning.
01:49:18 · Closing Remarks and the AI Count — Sundar Pichai
- Sundar Pichai summarizes Google’s full-stack AI strategy, from research and infrastructure to products and platforms, ending with a humorous tally of how many times ‘AI’ was said during the keynote.
02:09:08 · Google DeepMind and Project Astra — Demis Hassabis
- Demis Hassabis introduces Google DeepMind’s vision for universal AI agents and unveils Project Astra, a real-time, multimodal AI assistant prototype.
02:21:21 · Generative Media Models: Imagen 3, Music AI Sandbox, Veo — Demis Hassabis
- Hassabis announces a suite of new generative media models, including Imagen 3 for images, Music AI Sandbox for music creation, and Veo for high-quality video generation.
02:51:37 · Google Search in the Gemini Era — Liz Reid
- Liz Reid explains how generative AI is transforming Google Search, introducing AI Overviews, multi-step reasoning, and AI-organized results pages to handle complex queries.
03:04:04 · Search with Video Demo — Rose Yao
- Rose Yao demonstrates a new capability to ask questions in Google Search using video, troubleshooting a broken record player in real-time.
03:31:30 · Music AI Sandbox and Artist Collaborations — Doug Eck
- Doug Eck showcases the Music AI Sandbox, a suite of tools for artists, and features collaborations with musicians like Wyclef Jean and Marc Rebillet.
03:55:31 · Gemini for Workspace — Aparna Pappu
- Aparna Pappu announces the general availability of the Gemini-powered side panel in Workspace and demonstrates new capabilities in Gmail for summarizing and Q&A.

Products Announced (36)

01:00:30 · AlphaCode 2 (Research)
- Cracks competitive coding problems · Advanced problem-solving capabilities
- Research phase
01:15:40 · AI (General Theme)
- Integrated across all Google products · Multimodal capabilities · Agentive experiences
- Throughout the presentation
01:31:31 · Google I/O (2024)
- Annual developer conference · Showcases latest Google technology · Announcements on AI, Android, Search, etc.
- Event occurred in May 2024
01:46:48 · Sundar Pichai (Speaker)
- CEO of Google · Delivered opening and closing remarks · Outlined the ‘Gemini Era’ vision
- Presenter at the event
02:22:00 · The Gemini Era (Strategic Initiative)
- Natively multimodal models · Long context window capabilities · Foundation for agentive AI
- Underlying theme of the keynote
02:55:40 · Gemini (AI Model Family)
- Natively multimodal (text, image, video, code) · Powers numerous Google products · Available in Pro, Flash, and Nano sizes
- Various availability across products and APIs
03:37:38 · AI Overviews (Feature in Google Search)
- Generates AI-powered summaries for search queries · Synthesizes information from multiple sources · Handles complex, multi-step questions
- Rolling out in the US this week, to over 1B people by end of year.
06:02:00 · Ask Photos (Feature in Google Photos)
- Uses Gemini to answer natural language questions about your photo library · Can find specific information within photos (e.g., license plates) · Summarizes themes and progress over time
- Rolling out this summer
08:42:00 · Gemini 1.5 Pro (Updated Model)
- 1 million token context window available to consumers in Gemini Advanced · Improved quality in translation, coding, and reasoning · 2 million token context window available in private preview for developers
- Available to developers globally today; 1M tokens in Gemini Advanced today.
11:27:00 · Gemini 1.5 Flash (New Model)
- Lighter-weight and faster model · Optimized for speed and efficiency at scale · Retains multimodal reasoning and long context
- Available today in Google AI Studio and Vertex AI.
12:48:00 · AI Agent Vision (Future Concept)
- Intelligent systems that show reasoning, planning, and memory · Work across software and systems on your behalf · Operate under user supervision
- Future development
14:22:00 · NotebookLM with Audio Overviews (New Feature)
- Generates a conversational audio discussion from source materials · Allows users to join the conversation and ask questions · Powered by Gemini 1.5 Pro
- Prototype shown
24:51:00 · Project Astra (Prototype)
- Real-time, multimodal AI agent · Understands and responds to video and speech input conversationally · Can identify objects, code, and remember context (e.g., where glasses were left)
- Capabilities coming to Google products later this year.
29:13:00 · Generative Media Tools (Suite of Models)
- Imagen 3 for photorealistic image generation · Music AI Sandbox for music creation · Veo for high-quality 1080p video generation
- Various, with some available to select creators via waitlist.
29:51:00 · Imagen 3 (New Image Model)
- Highest quality text-to-image model from Google yet · More photorealistic with fewer artifacts · Improved understanding of natural language prompts and text rendering
- Available to sign up for in ImageFX today.
31:31:00 · Music AI Sandbox (Creative Tool Suite)
- Suite of professional music AI tools · Can create new instrumental sections from scratch · Allows style transfer between tracks
- In development with artists like Wyclef Jean, Marc Rebillet.
34:05:00 · Veo (New Video Model)
- Generates high-quality 1080p videos from text, image, and video prompts · Can create videos longer than a minute · Understands cinematic terms like ‘aerial shot’ or ‘timelapse’
- Available to select creators through VideoFX via waitlist.
39:00:00 · Trillium (6th Generation TPUs)
- 4.7x improvement in compute performance per chip · Google’s most efficient and performant TPU yet
- Available to Cloud customers late 2024.
39:30:00 · Axion Processors (New Hardware)
- Google’s first custom Arm-based CPU · Industry-leading performance and energy efficiency
- Announced last month.
41:36:00 · Google Search (Gemini Era Update)
- AI Overviews for summarized answers · Multi-step reasoning for complex queries · AI-organized results pages for brainstorming
- Rolling out starting today.
45:16:00 · Multi-step reasoning in Search (New Capability)
- Breaks down complex questions into smaller parts · Can research and synthesize information for planning (e.g., trips, meals) · Acts as an AI agent to perform research on the user’s behalf
- Coming soon to Search.
50:26:00 · Search with video (New Capability)
- Allows users to record a video to ask a question · AI analyzes the video to understand the problem (e.g., broken record player) · Provides troubleshooting steps in an AI Overview
- Coming soon.
55:52:00 · Gemini for Workspace Side Panel (General Availability)
- Integrates Gemini into the side of Workspace apps (Gmail, Docs, etc.) · Provides summaries, Q&A, and contextual actions · Powered by Gemini 1.5 Pro
- Generally available next month.
59:28:00 · Gmail features (Summarize, Q&A, Contextual Smart Reply) (New Features)
- Summarize long email threads · Ask questions about the content of your inbox · Contextual Smart Reply that understands the full conversation
- Rolling out to Labs users this month (Summarize) and July (Q&A, Smart Reply).
01:03:02 · AI Workflows in Workspace (New Capability)
- Automates multi-step processes across Workspace apps · Example: Organizing receipts from Gmail into a Drive folder and a Sheets tracker · Users can trigger and customize these workflows
- Rolling out to Labs users this September.
01:04:27 · AI Teammate (Prototype)
- A virtual, Gemini-powered teammate with its own identity and workspace account · Can be assigned tasks and roles within a team · Monitors projects, provides context, and facilitates collaboration in apps like Google Chat
- Prototype for 2025 and beyond.
01:09:28 · Gemini Live (New Experience)
- Enables in-depth, conversational voice interactions with Gemini · Allows users to interrupt and ask follow-up questions naturally · Will later incorporate video understanding from Project Astra
- Coming this summer.
01:10:17 · Gems (New Feature in Gemini)
- Allows users to create customized versions of Gemini · Acts as a personal expert on any specified topic (e.g., writing coach, yoga bestie) · Saves instructions for repeated use
- Rolling out in the coming months.
01:20:27 · Circle to Search (Homework Help) (New Capability)
- Solves complex math and physics word problems · Provides step-by-step instructions, not just the answer · Will handle more complex problems with symbolic formulas, diagrams, and graphs
- Available today, with more complex problem-solving later this year.
01:28:08 · TalkBack with Gemini Nano (Update)
- Provides richer, clearer descriptions for unlabeled images · Uses on-device Gemini Nano with multimodality · Works offline
- Coming later this year.
01:29:05 · On-device Scam Detection (New Feature)
- Uses Gemini Nano to listen for scam patterns during phone calls in real-time · Provides a real-time alert if a conversation seems suspicious · All processing is done on-device for privacy
- Testing, with more updates later this year.
01:36:01 · Gemma (Open Model Family)
- Lightweight, state-of-the-art open models · Built from the same research as Gemini · Includes 2B and 7B parameter sizes
- Available now.
01:36:39 · PaliGemma (New Open Model)
- Google’s first vision-language open model · Optimized for image captioning and visual Q&A · Based on the PaLI-3 architecture
- Available right now.
01:37:02 · Gemma 2 (New Open Model)
- New 27 billion parameter model · Optimized for running on TPUs and next-gen GPUs · Outperforms models twice its size
- Available in June.
01:43:27 · SynthID (Expanded Capability)
- Imperceptible watermarking for AI-generated content · Now expanding to text and video modalities · Helps identify AI-generated content to combat misinformation
- Text watermarking open-sourced in coming months.
01:45:53 · LearnLM (New Model Family)
- Family of models based on Gemini, fine-tuned for learning · Grounded in educational research · Powers features like Learning Coach in Gemini and conversational tutoring in YouTube
- Integrated into various Google products.

Commitments / Timelines (23)

03:37:00 (This week) — AI Overviews will begin launching to everyone in the US this week.
03:42:00 (Soon) — AI Overviews will be brought to more countries soon.
03:50:00 (By end of year) — AI Overviews will come to over 1B people by the end of the year.
07:38:00 (This summer) — Ask Photos will be rolling out this summer.
11:27:00 (Today) — Improved version of Gemini 1.5 Pro is being brought to all developers globally.
11:42:00 (Today) — Gemini 1.5 Pro with 1M context is now directly available for consumers in Gemini Advanced.
12:04:00 (Today (waitlist)) — The context window is being expanded to 2 million tokens, available for developers in private preview.
13:54:00 (Today) — Gemini 1.5 Pro is available today in Workspace Labs.
28:55:00 (Later this year) — Some Project Astra agent capabilities will come to Google products.
30:01:00 (Today) — Sign up to try Imagen 3 in ImageFX.
37:10:00 (Today (waitlist)) — Select creators can access Veo through VideoFX via a waitlist.
39:19:00 (Late 2024) — Trillium TPUs will be available to Cloud customers.
39:49:00 (Early 2025) — NVIDIA’s Blackwell GPUs will be available in Google Cloud.
49:52:00 (Soon) — AI-organized search results pages will come to movies, music, books, hotels, shopping, and more.
55:52:00 (Next month) — The new Gemini-powered side panel in Workspace will be generally available.
59:28:00 (This month / July) — New Gemini capabilities in Gmail (Summarize, Q&A, Contextual Smart Reply) will roll out to Labs users.
01:03:02 (This September) — AI Workflows in Workspace will be available to Labs users.
01:09:28 (This summer) — Gemini Live will be coming.
01:10:17 (In the coming months) — Gems will roll out in Gemini.
01:21:25 (Later this year) — Circle to Search will be able to tackle more complex problems involving symbolic formulas, diagrams, and graphs.
01:28:08 (Later this year) — Improvements to TalkBack with Gemini Nano are coming.
01:37:02 (In June) — Gemma 2 will be available.
01:44:02 (In the coming months) — SynthID text watermarking will be open-sourced.

Demos (11)

06:26:00 ✓ · Ask Photos - License Plate — Sundar Pichai (narrating)
- A user asks Google Photos ‘what’s my license plate number again’ and the app identifies the correct car and displays the license plate number from a photo.
06:56:00 ✓ · Ask Photos - Swimming Progress — Sundar Pichai (narrating)
- A user asks about their daughter’s swimming progress, and Gemini analyzes photos over time, including swimming certificates, to create a summarized answer.
12:55:00 ✓ · Gemini in Gmail - Summarize & Q&A — Aparna Pappu (narrating)
- A user summarizes a long email thread about school activities, then asks a follow-up question to compare roofing bids from different emails, which Gemini answers in a structured table.
13:25:00 ✓ · Gemini in Google Drive - Meeting Summary — Aparna Pappu (narrating)
- A user who missed a PTA meeting asks Gemini to summarize the hour-long recording stored in Google Drive, and it provides the key highlights.
14:22:00 ✓ · NotebookLM - Audio Overviews — Josh Woodward
- Josh Woodward shows how NotebookLM can take science materials and generate a conversational audio podcast between two hosts, which he then joins to ask a clarifying question.
26:20:00 ✓ · Project Astra - Real-time Multimodal Agent — Unnamed Google employee
- A user points their phone camera around a room, and the AI agent identifies objects (speaker, tweeter), provides creative alliteration for crayons, explains a line of code on a monitor, identifies the London neighborhood from a window view, and remembers where the user’s glasses were placed.
50:45:00 ✓ · Search with Video - Record Player Troubleshooting — Rose Yao
- Rose Yao records a video of a record player’s tonearm not staying in place and asks Google Search why. Search identifies the make and model, diagnoses the problem as an imbalance, and provides troubleshooting steps in an AI Overview.
01:04:49 ✓ · AI Teammate ‘Chip’ — Tony Vincent
- Tony Vincent demonstrates ‘Chip,’ an AI teammate in Google Chat, which summarizes project status, identifies conflicting decisions, and creates a document to help resolve an issue when prompted by a human team member.
01:20:40 ✓ · Circle to Search - Homework Help — Sameer Samat (narrating)
- A user circles a physics word problem on their phone, and Circle to Search provides a step-by-step solution for calculating the car’s acceleration.
01:22:47 ✓ · Gemini on Android - Contextual Awareness — Dave Burke
- Dave Burke shows Gemini as an overlay on Android, where it analyzes a YouTube video to answer a question about pickleball rules and analyzes a PDF to answer a question about spin serves, demonstrating its ability to understand on-screen content.
01:29:05 ✓ · On-device Scam Detection — Dave Burke
- A simulated phone call shows a user receiving a suspicious call from their ‘bank.’ Gemini Nano, running on-device, detects scam-like language (e.g., asking to transfer money) and displays a real-time ‘Likely scam’ alert.

Notable Quotes (8)

02:18:00 — Sundar Pichai:

At Google though, we are fully in our Gemini era. You’ll hear a lot about that today.
03:18:00 — Sundar Pichai:

It’s a big step in turning any input into any output. An I/O for a new generation.
12:04:00 — Sundar Pichai:

So today, we are expanding the context window to 2 million tokens.
25:00:00 — Demis Hassabis:

For a long time, we’ve wanted to build a universal AI agent that can be truly helpful in everyday life.
43:28:00 — Liz Reid:

And Google will do the Googling for you.
01:19:00 — Sameer Samat:

This is a once-in-a-generation moment to reinvent what phones can do.
01:49:46 — Sundar Pichai:

How many times have we mentioned AI today? And since a big theme today has been letting Google do the work for you, we went ahead and counted so that you don’t have to.
01:50:15 — Sundar Pichai:

That might be a record in how many times someone has said AI. 121.

Visual Signals (Beyond the Transcript)

On-Screen Text Moments (12)

00:00:10 · Google IO '23 on a calendar
- Establishes the one-year timeframe of progress being highlighted in the opening montage.
01:00:50 · Alpha Code 2 - Cracking Competitive Coding
- Title card for a specific AI research achievement.
02:55:40 · The Gemini Era
- Key branding for the entire presentation, signifying Google’s strategic focus.
03:57:00 · Gemini 1.5 Pro - 1M tokens
- Highlights the massive context window of the model being discussed.
05:37:00 · AI Overviews - Rolling out in US and more countries soon
- Announces the official name and rollout plan for the new search experience.
07:39:00 · Ask Photos with Gemini
- Brands the new feature being introduced for Google Photos.
12:09:00 · Gemini 1.5 Pro - 2M tokens
- Major announcement doubling the context window, met with audience applause.
24:51:00 · Project Astra
- Introduces the name of Google’s new AI agent prototype.
30:06:00 · Imagen 3
- Announces the next generation of Google’s image generation model.
34:05:00 · Veo
- Announces Google’s new generative video model.
01:10:14 · Gems
- Introduces the branding for customizable Gemini experts.
01:50:00 · 120 - AI count
- A humorous, self-aware moment acknowledging the heavy use of the term ‘AI’.

Stage Moments (7)

00:00:00 · A fast-paced, highly-produced montage showcases AI progress and news headlines from the past year.
01:30:00 · Sundar Pichai walks out onto the large outdoor stage at the Shoreline Amphitheatre to a cheering, packed audience.
02:51:00 · Sundar Pichai transitions to Liz Reid to discuss the future of Google Search.
12:08:00 · The audience applauds loudly for the announcement of the 2 million token context window.
20:51:00 · Sundar Pichai introduces Demis Hassabis, CEO of Google DeepMind, to the stage for the first time at I/O.
01:49:18 · Sundar Pichai returns to the stage for closing remarks.
01:50:00 · A slide reveals the ‘AI count’ is 120, which Sundar then updates to 121, drawing laughter from the audience.

Visual Demos (7)

06:26:00 · Google Photos UI on a smartphone.
- The user types a natural language query into the search bar. The app displays a photo of a car’s rear, highlighting the license plate with the number typed out as the answer.
26:20:00 · Project Astra prototype on a smartphone.
- A first-person view through the phone’s camera shows the AI identifying objects in real-time, explaining code on a screen, and recalling the location of a pair of glasses seen earlier.
30:13:00 · Images generated by Imagen 3.
- A photorealistic wolf, a detailed landscape, an origami owl, and the word ‘LIGHT’ rendered from rainbow feathers were displayed to showcase the model’s quality and text rendering.
35:52:00 · Videos generated by Veo.
- A variety of high-quality, cinematic video clips were shown, including a car driving through a futuristic city, a sailboat on the ocean, a woman in Nairobi, and a volcanic crater at sunrise.
50:45:00 · Google Search with video input on a smartphone.
- A user records their malfunctioning record player. The search results page shows an AI Overview identifying the specific model (Audio-Technica LP120) and providing steps to fix the unbalanced tonearm.
01:00:40 · AI Workflows in Gmail and Google Sheets.
- The Gemini side panel in Gmail offers to organize receipts. It then creates a new Drive folder and a Google Sheet, automatically populating it with extracted data like vendor, date, and cost from multiple emails.
01:04:49 · AI Teammate ‘Chip’ in Google Chat.
- The UI shows ‘Chip’ as a member of a chat room. When asked, it synthesizes information from the chat and linked documents to provide a project timeline and flag potential issues.

Production Signals (6)

00:00:00 · Pre-recorded, highly-edited opening montage.
01:30:00 · Live on-stage presentation at an outdoor amphitheater.
08:41:00 · Pre-recorded segment with developer testimonials for Gemini 1.5.
26:20:00 · Pre-recorded, single-take demonstration of the Project Astra prototype.
31:58:00 · Pre-recorded segment featuring musicians collaborating with Music AI Sandbox.
35:37:00 · Pre-recorded segment with Donald Glover and his studio, Gilga, collaborating with Veo.

Key Topics

Generative AI · Gemini Model · Multimodality · Long Context Window · AI Agents · Google Search · Android AI · Developer Tools · AI Safety · Creative Tools · Google Workspace · On-device AI · Open Models · AI Infrastructure

Takeaways

Google is positioning the ‘Gemini Era’ as a fundamental transformation, integrating its most advanced AI across its entire product ecosystem, from Search and Android to Workspace.
The future of AI interaction is agentive and multimodal; Google’s vision is for AI to proactively ‘do the work for you’ by understanding complex, multi-step tasks across different apps and data types (text, image, video, audio).
Massive increases in context window size, now up to 2 million tokens in preview, are a key technical differentiator, enabling deep analysis of large documents, codebases, and hours of video.
Google is building a full-stack AI platform, from custom Trillium TPUs and liquid-cooled data centers to a family of models (Gemini Pro, Flash, Nano) and open-source alternatives (Gemma), aiming to provide the best tools for any developer or enterprise workload.
New generative media models (Imagen 3, Veo, Music AI Sandbox) demonstrate significant leaps in quality and creative control, signaling a major push into AI-powered content creation.
On-device AI (Gemini Nano) is a strategic priority for Android, enabling fast, private, and context-aware features like real-time scam detection and enhanced accessibility without needing a network connection.
Google Search is being completely reimagined, moving beyond a list of links to an AI-powered answer engine that can synthesize information, handle complex planning, and organize results for brainstorming.
Alongside pushing capabilities, Google is emphasizing its commitment to responsible AI through technical solutions like SynthID watermarking for text and video and collaborative efforts on safety and standards.