I/O 2024: Google DeepMind

Year: 2024 · ▶ Watch on YouTube

Demis Hassabis (Co-Founder & CEO) · Doug Eck (Senior Research Director, AI)

Switch language → zh

Segments (8)

00:00:00 · Introduction to Google DeepMind — Demis Hassabis
- Demis Hassabis introduces the mission of Google DeepMind: to build AGI responsibly to benefit humanity, highlighting recent breakthroughs.
02:20 · Introducing Gemini 1.5 Flash — Demis Hassabis
- A new, lighter-weight Gemini model is announced, designed for speed, efficiency, and lower cost at scale.
03:50 · Introducing Project Astra — Demis Hassabis
- The vision for a universal, multimodal AI agent is unveiled, capable of conversational, real-time understanding and interaction with the world.
08:17 · Generative Media: Bringing Creative Ideas to Life — Doug Eck
- Doug Eck takes the stage to introduce a series of updates to Google’s generative media tools for image, music, and video.
09:08 · Introducing Imagen 3 — Doug Eck
- The announcement of Imagen 3, Google’s most capable text-to-image model, with improved photorealism, detail, and text rendering.
10:17 · Music AI Sandbox — Doug Eck
- A suite of AI tools for musicians is showcased through a video featuring artists like Wyclef Jean and Marc Rebillet.
12:56 · Introducing Veo — Demis Hassabis
- Demis Hassabis announces Veo, Google’s most capable generative video model, for creating high-quality 1080p video from various prompts.
14:29 · Veo Collaboration with Donald Glover — Demis Hassabis
- A short film is presented, created in collaboration with Donald Glover and his studio Gilga, demonstrating Veo’s filmmaking capabilities.

Products Announced (5)

02:20 · Gemini 1.5 Flash (New Model)
- Lighter-weight and cost-efficient · Optimized for speed and low latency · Maintains multimodal reasoning and long context window
- Available today in Google AI Studio and Vertex AI.
03:55 · Project Astra (Vision / Project)
- Universal AI agent · Real-time, multimodal understanding (vision and speech) · Conversational and context-aware
- Capabilities coming to Google products later this year.
09:11 · Imagen 3 (New Model)
- Improved photorealism and detail · Better understanding of natural language prompts · Advanced text rendering capabilities
- Sign-ups open today for private preview in ImageFX.
10:35 · Music AI Sandbox (Tool Suite)
- Create new instrumental sections from scratch · Transfer styles between tracks · Collaborative tool for artists
- In development with YouTube and artists.
13:09 · Veo (New Model)
- Generates high-quality 1080p video over a minute long · Understands cinematic terms and visual styles · Maintains consistency of subjects across shots
- Waitlist open now for VideoFX; available to select creators in the coming weeks.

Commitments / Timelines (6)

02:49 (today) — Gemini 1.5 Flash and 1.5 Pro are available with up to 1 million tokens in Google AI Studio and Vertex AI.
02:58 (today) — Developers can sign up to try a 2 million token context window.
07:59 (later this year) — Some Project Astra agent capabilities will come to Google products like the Gemini app.
10:05 (today) — Sign-ups are open to try Imagen 3 in ImageFX.
16:13 (over the coming weeks) — Some Veo features will be available to select creators through VideoFX.
16:20 (now) — The waitlist for VideoFX with Veo is open.

Demos (3)

05:23 ✓ · Project Astra — Unnamed Google employee (in video)
- A pre-recorded, first-person demo of the AI agent identifying objects, explaining code, remembering previous context (location of glasses), and engaging in creative tasks on a phone and through smart glasses.
11:02 ✓ · Music AI Sandbox — Wyclef Jean, Marc Rebillet (in video)
- A pre-recorded video of professional musicians using the AI tools to generate, sample, and modify musical loops and tracks in a studio setting.
14:30 ✓ · Veo Filmmaking — Donald Glover and his team (in video)
- A pre-recorded video showcasing a creative team using Veo to generate various video clips from text prompts to brainstorm and create a short film.

Notable Quotes (8)

00:37 — Demis Hassabis:

I co-founded DeepMind in 2010 with the goal of one day building AGI, artificial general intelligence.
02:20 — Demis Hassabis:

So today, we’re introducing Gemini 1.5 Flash.
04:02 — Demis Hassabis:

For a long time, we’ve wanted to build a universal AI agent that can be truly helpful in everyday life.
09:09 — Doug Eck:

Today, I’m so excited to introduce Imagen 3, our most capable image generation model yet.
13:04 — Demis Hassabis:

Today, I’m excited to announce our newest, most capable generative video model, called Veo.
15:39 — Donald Glover:

Everybody’s going to become a director, and everybody should be a director.
15:44 — Donald Glover:

Because at the heart of all of this is just storytelling.
16:50 — Demis Hassabis:

We knew that one day it would change everything. Now that time is here.

Visual Signals (Beyond the Transcript)

On-Screen Text Moments (9)

00:00 · Google DeepMind
- Sets the topic for the entire presentation segment.
02:22 · Gemini 1.5 Flash
- Official branding for the newly announced model.
02:51 · Available in Google AI Studio and Vertex AI / 1M tokens
- Key availability and capability announcement for developers.
03:55 · Project Astra
- Official name for Google’s AI agent vision.
04:03 · A universal AI agent helpful in everyday life
- The core mission statement for Project Astra.
09:11 · Imagen 3
- Official branding for the new text-to-image model.
10:36 · Music AI Sandbox
- Official name for the suite of music creation tools.
13:09 · Veo
- Official branding for the new text-to-video model.
16:02 · A collaboration between Google DeepMind, Donald Glover, and Gilga. Coming soon.
- Credits the high-profile collaboration for the Veo demo.

Stage Moments (5)

00:01 · Demis Hassabis walks on stage, shaking hands with Sundar Pichai who is exiting.
02:27 · The audience applauds the announcement of Gemini 1.5 Flash.
08:40 · Demis Hassabis introduces Doug Eck, who walks on stage to take over the presentation.
12:43 · Demis Hassabis returns to the stage after the Music AI Sandbox video.
16:07 · The audience gives a strong round of applause following the Veo demo video with Donald Glover.

Visual Demos (5)

05:23 · Project Astra Demo
- A first-person view from a phone camera, where the AI identifies objects in an office, explains code, and remembers the location of glasses. The demo transitions to a view through smart glasses.
09:16 · Imagen 3 Examples
- A series of high-quality, photorealistic and artistic images generated by Imagen 3, including a wolf, people laughing in sunlight, a landscape, and the word ‘LIGHT’ made of feathers.
11:02 · Music AI Sandbox Demo
- Musicians Wyclef Jean and Marc Rebillet in a studio interacting with a UI to generate and combine musical elements, showing prompts and resulting audio waveforms.
13:15 · Veo Examples
- A montage of diverse, high-quality, 1080p video clips generated by Veo, including a dog in a bathtub, an aerial shot of a lighthouse, a blooming sunflower, and a car driving through a city.
14:30 · Veo Filmmaking Demo with Donald Glover
- Donald Glover and his creative team using a text-prompt interface to generate various video shots (a car driving to a palace, a sailboat, a jungle trail) for a short film project.

Production Signals (3)

05:23 · Pre-recorded demo segment for Project Astra, labeled ‘Prototype shown’.
11:02 · Pre-recorded video segment showcasing the Music AI Sandbox with artists.
14:30 · Pre-recorded video segment showcasing Veo in collaboration with Donald Glover.

Key Topics

Artificial General Intelligence (AGI) · Multimodal AI · AI Agents · Generative AI · Text-to-Video Generation · Text-to-Image Generation · AI for Creativity · Music Generation · Google DeepMind · Project Astra · Gemini Models · Veo · Imagen 3 · AI Responsibility

Takeaways

Google DeepMind is positioned as the core engine driving Google’s most ambitious AI research, with a clear long-term goal of achieving AGI.
The Gemini model family is diversifying to serve different needs: Gemini 1.5 Pro for peak capability and the new Gemini 1.5 Flash for speed and cost-efficiency at scale.
Project Astra represents Google’s vision for the future of AI assistants: a proactive, conversational, and multimodal agent that understands the world through sight and sound in real-time.
Google is making a major push into generative media, launching powerful new tools for creators across video (Veo), image (Imagen 3), and music (Music AI Sandbox).
High-profile collaborations with creators like Donald Glover and Wyclef Jean are a key part of Google’s strategy to develop and validate its creative AI tools.
The core technical challenge being addressed is reducing latency and improving contextual memory to make AI interactions feel natural and truly helpful in everyday life.
Google is advancing the state-of-the-art in generative video with Veo, focusing on high-resolution output, longer clip duration, and maintaining visual consistency.