Automating Creativity: Gen Media Agents with ADK and MCP

Year: 2026 · ▶ Watch on YouTube

Stephanie Wong (Global Lead, Developer Programs) · Katie Nguyen (Developer Relations Engineer)

Switch language → zh

Segments (10)

00:00:00 · Introduction — Stephanie Wong
- Stephanie Wong introduces Katie Nguyen to discuss automating generative media workflows.
00:00:24 · The Programmatic Side of Gen Media — Katie Nguyen
- Katie explains how agents can automate creative workflows, handle consistency, and assist with ideation.
00:01:25 · Live Demo: The Character Story Agent — Katie Nguyen
- Katie starts a live demo using the Agent Development Kit (ADK) to generate a story about a mischievous dog named Lulu.
00:05:05 · Code Walkthrough: Agent & Tool Implementation — Katie Nguyen
- While the agent generates media, Katie walks through the Python code, showing how tools for Veo, Gemini, Lyria, and Nano Banana are defined and used.
00:07:14 · Agentic Benefits & Character Consistency — Katie Nguyen
- The agent maintains character and story consistency by using the initial generated image as a reference for all subsequent media.
00:08:38 · Demo Result: The Final Video — Katie Nguyen
- Katie plays the final video of ‘Lulu’s Home Alone Story,’ which was fully generated by the agent, including scenes, narration, and music.
00:09:53 · Iterating with the Agent — Katie Nguyen
- Katie demonstrates how to iterate on the result by asking the agent in natural language to make the background music louder.
00:10:20 · Using Agent Skills for Robustness — Katie Nguyen
- She explains how ‘Skills’ (pre-defined instruction sets) can be used to give the agent more detailed capabilities, like a ‘Voice Director’ skill for TTS.
00:11:20 · Evaluation in an Agentic Loop — Katie Nguyen
- Katie discusses using an ‘evaluator agent’ to act as a judge, automatically checking the generated media for quality and adherence to the prompt.
00:12:28 · Conclusion — Stephanie Wong
- Stephanie wraps up the session, highlighting the power of using agents for complex creative tasks.

Products Announced (5)

00:01:39 · Agent Development Kit (ADK) (1.31.0)
- Framework for building AI agents. · Provides a web UI for testing and interaction. · Integrates with various tools and models.
- Shown in demo.
00:03:19 · Nano Banana 2 (Preview)
- Image generation model. · Preserves details from natural language prompts. · Used for creating the initial character image.
- Used in demo.
00:05:26 · Veo (3.1-lite-generate-001)
- Text-to-video and image-to-video generation. · High-quality, consistent video creation. · Configurable parameters like duration and resolution.
- Used in demo.
00:06:58 · Lyria (3-clip-preview)
- Generative music model. · Creates background music based on the mood of scenes. · Accessible as a tool within the agent framework.
- Used in demo.
00:06:52 · Gemini Text-to-Speech (TTS) (3.1-flash-tts-preview)
- Generates voiceover narration. · Supports expressive audio tags for nuanced delivery. · Offers a variety of pre-built voices.
- Used in demo.

Demos (1)

00:01:25 ✓ · Character Story Agent — Katie Nguyen
- A live demo of an AI agent built with the Agent Development Kit (ADK) that takes a simple prompt (‘a story about a dog home alone’), interacts with the user for details, and then autonomously generates a multi-scene story complete with consistent character images (Nano Banana), animated video clips (Veo), voice narration (Gemini TTS), and background music (Lyria). The agent then combines all assets into a final video file and even performs edits based on natural language feedback.

Notable Quotes (3)

00:01:08 — Katie Nguyen:

Gemini is going to keep track of all of this, and the agent has that memory. It’s going to be able to reference and use the previous assets that it created to really create a whole story and make sure it’s really cohesive.
00:07:20 — Stephanie Wong:

From the prompting perspective, you really didn’t have to provide much detail, but you’ve already set up basically the logic and the structure that you need for the agent to go kick off this automated process from image to video and include sound.
00:11:20 — Katie Nguyen:

You can create like an image evaluator agent… and have it take in the media, compare it against the original prompt… and use LLM as a judge in a way.

Visual Signals

On-screen (9)

00:00:07 · Lower third: 'Stephanie Wong, Global Lead, Developer Programs, Google Cloud'
- Identifies the host and her role.
00:00:43 · Lower third: 'Katie Nguyen, Developer Relations Engineer, Google Cloud'
- Identifies the speaker and her role.
00:01:33 · Terminal command: adk web –sdk-web-port=8024``
- Shows the command to start the Agent Development Kit’s local web server for testing.
00:01:50 · Agent Development Kit Web UI
- The primary interface for interacting with and debugging the AI agent during the demo.
00:02:24 · User prompt in chat: 'Let's generate a story about a dog that's home alone and getting into trouble'
- The initial high-level instruction given to the agent.
00:04:04 · Agent response with a 3-scene story arc: 1. The Kitchen Caper, 2. The Shredding Symphony, 3. The Sle
- Shows the agent’s ability to structure a narrative from a simple prompt.
00:06:11 · Python code in VS Code (agent.py)
- Reveals the underlying implementation of the agent, including tool definitions and SDK calls.
00:10:40 · Markdown file (SKILL.md) for 'GenMedia Voice Director'
- Demonstrates the concept of an agent ‘skill’ which provides detailed, reusable instructions to the LLM for a specific task.
00:11:28 · Final generated video playing in QuickTime Player
- The culmination of the agent’s automated workflow, showing the combined video, audio, and narration.

Stage (1)

00:00:00 · Two speakers, Stephanie Wong and Katie Nguyen, are seated at a desk in a studio setting at Google Cloud Next, with microphones and laptops.

Visual demos (1)

00:01:33 · A live screen-sharing demo.
- The demoer (Katie) starts in a terminal, runs a command to launch the ADK web UI. She then switches to the browser showing the chat interface. She types prompts and the agent responds with text and function calls. She navigates her local file system to show the generated image and video files. She also shows the Python source code for the agent and a markdown file for an agent skill in a code editor.

Key Topics

Generative AI · AI Agents · Multimodality · Agentic Workflows · Video Generation · Music Generation · Text-to-Speech · Agent Development Kit (ADK) · Google Cloud · Gemini · Veo · Lyria · Creative Automation · Developer Tools

Takeaways

AI agents can automate complex, multi-step creative workflows, such as generating a complete story with images, video, and audio from a single prompt.
Google’s Agent Development Kit (ADK) provides a framework and UI for building, testing, and interacting with these agents.
Agentic workflows excel at maintaining consistency across different media types (e.g., character appearance) by using previously generated assets as references.
The agent leverages a suite of generative models as ‘tools’: Nano Banana for images, Veo for video, Lyria for music, and Gemini for narration (TTS) and overall reasoning.
Users can interact with the agent using natural language to provide initial ideas and make iterative changes, acting as a creative collaborator rather than a micro-manager.
The concept of ‘Skills’ allows developers to provide agents with detailed, reusable instruction sets for complex tasks, making the agent more robust and capable.
Evaluation can also be integrated into the agentic loop, where an LLM acts as a ‘judge’ to assess the quality and prompt-adherence of the generated content.