Nano Banana, Veo, Lyria: gen media stack

Year: 2026 · ▶ Watch on YouTube

Stephanie Wong (Global Lead, Developer Programs) · Khulan Davaajav (Product Marketing Manager, Genmedia)

Switch language → zh

Segments (9)

00:00:00 · Introduction — Stephanie Wong
- Stephanie Wong introduces Khulan Davaajav to discuss the latest in Google’s Generative Media models.
00:00:21 · Generative Media Landscape Overview — Khulan Davaajav
- Khulan defines ‘Generative Media’ as an umbrella term for Google’s models covering image, video, audio, and music generation.
01:53:00 · Demo: ‘WFH: Working From Hunger’ — Khulan Davaajav
- A short animated film created entirely with Google’s Generative Media models is shown, telling a story about working from home.
03:17:00 · Image Generation with Nano Banana 2 — Khulan Davaajav
- Explains how the keyframes for the animation were created using Nano Banana 2 with highly specific prompts for artistic style and camera effects.
05:53:00 · Video Generation with Veo 3.1 Lite — Khulan Davaajav
- Demonstrates how Veo 3.1 Lite was used to animate the static keyframes, highlighting its cost-effectiveness and speed.
08:51:00 · Music & Sound with Lyria 3 Pro — Khulan Davaajav
- Shows how Lyria 3 Pro can generate background music and sound effects with precise timing using timestamp-based prompting.
11:42:00 · Expressive Voiceover with Gemini 3.1 Flash TTS — Khulan Davaajav
- Details how Gemini 3.1 Flash TTS allows for highly expressive, human-like voiceovers by using tags to control emotion and style.
14:29:00 · Demo: Gemini 3.1 Flash Live & Live Avatar — Khulan Davaajav
- A live demo of an interactive AI avatar powered by Gemini 3.1 Flash Live, which can answer questions using real-time data from Google Search.
17:30:00 · Future of Creative AI — Khulan Davaajav
- Khulan expresses excitement for future World Models (like Genie) and lower latency in generation, which will further empower creatives.

Products Announced (6)

00:00:58 · Generative Media Models (Suite)
- Nano Banana for image generation · Veo for video generation · Gemini Audio and Lyria for audio/music
- Available on Google Cloud
03:26:00 · Nano Banana 2 (Launched)
- High-fidelity image generation · Detailed prompt control over artistic style · Control over camera effects like film type and lighting
- Available
06:07:00 · Veo 3.1 Lite (Launched)
- Cost-effective video generation · Fast generation speed (under 60 seconds per frame) · Image-to-video and first/last frame animation
- Available
09:07:00 · Lyria 3 Pro (Launched)
- Timestamp-based prompting for precise music timing · Generation of both instrumental music and sound effects · Understands musical composition and can include vocals
- Available
11:53:00 · Gemini 3.1 Flash TTS (Launched)
- Expressive speech with over 200 control tags (e.g., [panicked], [laughs]) · Control over voice style (e.g., comedy narrator) · Control over accent and language
- Available
14:29:00 · Gemini 3.1 Flash Live & Live Avatar (Preview)
- Real-time, audio-to-audio conversation · Connects to Google Search for live data retrieval · Powers interactive, lip-synced AI avatars
- In Preview

Demos (2)

02:09:00 ✓ · WFH: Working From Hunger — Khulan Davaajav
- A short, 3D-animated film about a person working from home, snacking, getting a sugar rush, and then crashing. The entire film, including images, video, music, sound effects, and voiceover, was created using Google’s Generative Media models.
14:57:00 ✓ · Gemini 3.1 Flash Live & Live Avatar — Khulan Davaajav
- Khulan engaged in a live, spoken conversation with an AI avatar. She asked for the current weather in Las Vegas, and the avatar correctly retrieved and stated the high and low temperatures from Google Search in real-time.

Notable Quotes (4)

00:42:00 — Khulan Davaajav:

Especially at Google we say Gen Media, Generative Media, and our customers and developers are like, ‘What is Gen Media?’
04:18:00 — Khulan Davaajav:

The one amazing thing about Nano Banana is that you can really control it with your artistic kind of decision you want to make.
17:00:00 — Khulan Davaajav:

The highlight of Gemini 3.1 Flash Live, it’s audio-to-audio.
18:10:00 — Khulan Davaajav:

In the world model, you are the camera operator just moving around, which is amazing.

Visual Signals

On-screen (9)

00:05:00 · Lower third: 'Stephanie Wong, Global Lead, Developer Programs, Google Cloud'
- Identifies the host and her role.
00:39:00 · Lower third: 'Khulan Davaajav, Product Marketing Manager, Genmedia, Google Cloud'
- Identifies the guest speaker and her role.
00:56:00 · Title card: 'Generative Media' with 'Nano Banana, Veo, Gemini Audio, Lyria' listed below.
- Visually introduces the suite of models being discussed.
03:26:00 · Title card: 'Nano Banana 2' with a grid of 9 keyframe images from the animation.
- Shows the static images that were generated before being animated.
03:49:00 · Prompt for Nano Banana 2: '3D render, Memphis Design style, smooth soft-touch silicone textures...'
- Reveals the level of detail and specific terminology used to achieve the desired visual style.
06:09:00 · Title card: 'Veo 3.1 Lite' showing 'First frame' and 'Last frame' inputs and the resulting animated
- Demonstrates the image-to-video capability of the Veo model.
09:21:00 · Title card: 'Lyria 3 Pro' with timestamped prompts for background music.
- Illustrates how users can precisely control music changes and sound effects over time.
12:11:00 · Title card: 'Gemini 3.1 Flash TTS' with a script containing expressive speech tags like '[positive]'
- Shows the syntax for controlling the emotion and delivery of the text-to-speech voice.
14:29:00 · Title card: 'Gemini 3.1 Flash Live & Live Avatar'
- Introduces the live, interactive avatar feature.

Stage (1)

00:00:00 · The segment is filmed in a studio setting at Google Cloud Next, with two speakers sitting at a desk with microphones and laptops.

Visual demos (2)

02:09:00 · A 3D animated short film titled ‘WFH: Working From Hunger’.
- A claymation-style character working on a laptop, eating snacks, getting a sugar rush and dancing under a disco ball, and finally crashing on a couch.
14:57:00 · A live conversation with a digital avatar.
- An animated male avatar on a screen, with a ‘Connected’ status. The avatar’s lips move in sync with its speech, and it displays subtle facial expressions and head movements.

Key Topics

Generative AI · Multimodal AI · Generative Media · Text-to-Image · Text-to-Video · Text-to-Music · Text-to-Speech (TTS) · AI Avatars · Creative AI · Prompt Engineering · Google Cloud · Gemini · Veo · Lyria · Nano Banana

Takeaways

Google is unifying its creative AI tools under the ‘Generative Media’ umbrella, which includes models for image (Nano Banana), video (Veo), audio (Gemini Audio), and music (Lyria).
The latest models offer a high degree of creative control through detailed prompting, allowing users to specify artistic styles, camera techniques, lighting, and even film stock.
Combining models is a powerful workflow: use Gemini to brainstorm and refine prompts, Nano Banana to create keyframes, Veo to animate them, Lyria for timed music, and Gemini TTS for expressive voiceover.
Lyria 3 Pro enables precise audio-video synchronization through timestamp-based prompting, allowing for dynamic changes in music and sound effects at specific moments.
Gemini 3.1 Flash TTS moves beyond robotic voices by using expressive tags (e.g., [laughs], [panicked]) to generate human-like, emotional speech.
The new Gemini 3.1 Flash Live & Live Avatar feature enables real-time, audio-to-audio conversations with an AI that can pull live data from Google Search, opening up use cases in education, customer service, and interactive entertainment.
The future of creative AI at Google is focused on creating immersive ‘World Models’ and reducing latency to keep creatives in their ‘flow state’.