I/O 2024: Google DeepMind

Year: 2024 · ▶ Watch on YouTube

Demis Hassabis (Co-Founder & CEO) · Doug Eck (Senior Research Director, AI)

Switch language → zh

Segments (8)

  • 00:00:00 · Introduction to Google DeepMind — Demis Hassabis
    • Demis Hassabis introduces the mission of Google DeepMind: to build AGI responsibly to benefit humanity, highlighting recent breakthroughs.
  • 02:20 · Introducing Gemini 1.5 Flash — Demis Hassabis
    • A new, lighter-weight Gemini model is announced, designed for speed, efficiency, and lower cost at scale.
  • 03:50 · Introducing Project Astra — Demis Hassabis
    • The vision for a universal, multimodal AI agent is unveiled, capable of conversational, real-time understanding and interaction with the world.
  • 08:17 · Generative Media: Bringing Creative Ideas to Life — Doug Eck
    • Doug Eck takes the stage to introduce a series of updates to Google’s generative media tools for image, music, and video.
  • 09:08 · Introducing Imagen 3 — Doug Eck
    • The announcement of Imagen 3, Google’s most capable text-to-image model, with improved photorealism, detail, and text rendering.
  • 10:17 · Music AI Sandbox — Doug Eck
    • A suite of AI tools for musicians is showcased through a video featuring artists like Wyclef Jean and Marc Rebillet.
  • 12:56 · Introducing Veo — Demis Hassabis
    • Demis Hassabis announces Veo, Google’s most capable generative video model, for creating high-quality 1080p video from various prompts.
  • 14:29 · Veo Collaboration with Donald Glover — Demis Hassabis
    • A short film is presented, created in collaboration with Donald Glover and his studio Gilga, demonstrating Veo’s filmmaking capabilities.

Products Announced (5)

  • 02:20 · Gemini 1.5 Flash (New Model)
    • Lighter-weight and cost-efficient · Optimized for speed and low latency · Maintains multimodal reasoning and long context window
    • Available today in Google AI Studio and Vertex AI.
  • 03:55 · Project Astra (Vision / Project)
    • Universal AI agent · Real-time, multimodal understanding (vision and speech) · Conversational and context-aware
    • Capabilities coming to Google products later this year.
  • 09:11 · Imagen 3 (New Model)
    • Improved photorealism and detail · Better understanding of natural language prompts · Advanced text rendering capabilities
    • Sign-ups open today for private preview in ImageFX.
  • 10:35 · Music AI Sandbox (Tool Suite)
    • Create new instrumental sections from scratch · Transfer styles between tracks · Collaborative tool for artists
    • In development with YouTube and artists.
  • 13:09 · Veo (New Model)
    • Generates high-quality 1080p video over a minute long · Understands cinematic terms and visual styles · Maintains consistency of subjects across shots
    • Waitlist open now for VideoFX; available to select creators in the coming weeks.

Commitments / Timelines (6)

  • 02:49 (today) — Gemini 1.5 Flash and 1.5 Pro are available with up to 1 million tokens in Google AI Studio and Vertex AI.
  • 02:58 (today) — Developers can sign up to try a 2 million token context window.
  • 07:59 (later this year) — Some Project Astra agent capabilities will come to Google products like the Gemini app.
  • 10:05 (today) — Sign-ups are open to try Imagen 3 in ImageFX.
  • 16:13 (over the coming weeks) — Some Veo features will be available to select creators through VideoFX.
  • 16:20 (now) — The waitlist for VideoFX with Veo is open.

Demos (3)

  • 05:23 ✓ · Project Astra — Unnamed Google employee (in video)
    • A pre-recorded, first-person demo of the AI agent identifying objects, explaining code, remembering previous context (location of glasses), and engaging in creative tasks on a phone and through smart glasses.
  • 11:02 ✓ · Music AI Sandbox — Wyclef Jean, Marc Rebillet (in video)
    • A pre-recorded video of professional musicians using the AI tools to generate, sample, and modify musical loops and tracks in a studio setting.
  • 14:30 ✓ · Veo Filmmaking — Donald Glover and his team (in video)
    • A pre-recorded video showcasing a creative team using Veo to generate various video clips from text prompts to brainstorm and create a short film.

Notable Quotes (8)

  • 00:37 — Demis Hassabis:

    I co-founded DeepMind in 2010 with the goal of one day building AGI, artificial general intelligence.

  • 02:20 — Demis Hassabis:

    So today, we’re introducing Gemini 1.5 Flash.

  • 04:02 — Demis Hassabis:

    For a long time, we’ve wanted to build a universal AI agent that can be truly helpful in everyday life.

  • 09:09 — Doug Eck:

    Today, I’m so excited to introduce Imagen 3, our most capable image generation model yet.

  • 13:04 — Demis Hassabis:

    Today, I’m excited to announce our newest, most capable generative video model, called Veo.

  • 15:39 — Donald Glover:

    Everybody’s going to become a director, and everybody should be a director.

  • 15:44 — Donald Glover:

    Because at the heart of all of this is just storytelling.

  • 16:50 — Demis Hassabis:

    We knew that one day it would change everything. Now that time is here.

Visual Signals (Beyond the Transcript)

On-Screen Text Moments (9)

  • 00:00 · Google DeepMind
    • Sets the topic for the entire presentation segment.
  • 02:22 · Gemini 1.5 Flash
    • Official branding for the newly announced model.
  • 02:51 · Available in Google AI Studio and Vertex AI / 1M tokens
    • Key availability and capability announcement for developers.
  • 03:55 · Project Astra
    • Official name for Google’s AI agent vision.
  • 04:03 · A universal AI agent helpful in everyday life
    • The core mission statement for Project Astra.
  • 09:11 · Imagen 3
    • Official branding for the new text-to-image model.
  • 10:36 · Music AI Sandbox
    • Official name for the suite of music creation tools.
  • 13:09 · Veo
    • Official branding for the new text-to-video model.
  • 16:02 · A collaboration between Google DeepMind, Donald Glover, and Gilga. Coming soon.
    • Credits the high-profile collaboration for the Veo demo.

Stage Moments (5)

  • 00:01 · Demis Hassabis walks on stage, shaking hands with Sundar Pichai who is exiting.
  • 02:27 · The audience applauds the announcement of Gemini 1.5 Flash.
  • 08:40 · Demis Hassabis introduces Doug Eck, who walks on stage to take over the presentation.
  • 12:43 · Demis Hassabis returns to the stage after the Music AI Sandbox video.
  • 16:07 · The audience gives a strong round of applause following the Veo demo video with Donald Glover.

Visual Demos (5)

  • 05:23 · Project Astra Demo
    • A first-person view from a phone camera, where the AI identifies objects in an office, explains code, and remembers the location of glasses. The demo transitions to a view through smart glasses.
  • 09:16 · Imagen 3 Examples
    • A series of high-quality, photorealistic and artistic images generated by Imagen 3, including a wolf, people laughing in sunlight, a landscape, and the word ‘LIGHT’ made of feathers.
  • 11:02 · Music AI Sandbox Demo
    • Musicians Wyclef Jean and Marc Rebillet in a studio interacting with a UI to generate and combine musical elements, showing prompts and resulting audio waveforms.
  • 13:15 · Veo Examples
    • A montage of diverse, high-quality, 1080p video clips generated by Veo, including a dog in a bathtub, an aerial shot of a lighthouse, a blooming sunflower, and a car driving through a city.
  • 14:30 · Veo Filmmaking Demo with Donald Glover
    • Donald Glover and his creative team using a text-prompt interface to generate various video shots (a car driving to a palace, a sailboat, a jungle trail) for a short film project.

Production Signals (3)

  • 05:23 · Pre-recorded demo segment for Project Astra, labeled ‘Prototype shown’.
  • 11:02 · Pre-recorded video segment showcasing the Music AI Sandbox with artists.
  • 14:30 · Pre-recorded video segment showcasing Veo in collaboration with Donald Glover.

Key Topics

Artificial General Intelligence (AGI) · Multimodal AI · AI Agents · Generative AI · Text-to-Video Generation · Text-to-Image Generation · AI for Creativity · Music Generation · Google DeepMind · Project Astra · Gemini Models · Veo · Imagen 3 · AI Responsibility

Takeaways

  • Google DeepMind is positioned as the core engine driving Google’s most ambitious AI research, with a clear long-term goal of achieving AGI.
  • The Gemini model family is diversifying to serve different needs: Gemini 1.5 Pro for peak capability and the new Gemini 1.5 Flash for speed and cost-efficiency at scale.
  • Project Astra represents Google’s vision for the future of AI assistants: a proactive, conversational, and multimodal agent that understands the world through sight and sound in real-time.
  • Google is making a major push into generative media, launching powerful new tools for creators across video (Veo), image (Imagen 3), and music (Music AI Sandbox).
  • High-profile collaborations with creators like Donald Glover and Wyclef Jean are a key part of Google’s strategy to develop and validate its creative AI tools.
  • The core technical challenge being addressed is reducing latency and improving contextual memory to make AI interactions feel natural and truly helpful in everyday life.
  • Google is advancing the state-of-the-art in generative video with Veo, focusing on high-resolution output, longer clip duration, and maintaining visual consistency.