Running Gemma 4 on smartphones to Raspberry Pi

Year: 2026 · ▶ Watch on YouTube

Muhammad Farooq (Google Developer Expert, Cloud & AI) · Omar Sanseviero (Lead AI Developer Experience)

Switch language → zh

Segments (10)

00:00:00 · Introduction — Muhammad Farooq
- Muhammad introduces Omar from Google DeepMind to discuss the Gemma family of models.
00:00:11 · Gemma Launch Success — Omar Sanseviero
- Omar describes the Gemma launch as Google’s largest open model release ever, with over 40 million downloads in three weeks.
00:00:33 · The Gemma Model Family — Omar Sanseviero
- The models range from 2B to 31B parameters, designed to be developer-friendly and efficient enough to run on consumer hardware.
00:01:17 · Model Capabilities — Omar Sanseviero
- Gemma models are multimodal (audio, video, image) and multilingual (trained on over 140 languages), with different capabilities based on size.
00:01:58 · The ‘Gemmaverse’ Ecosystem — Omar Sanseviero
- Omar introduces the ‘Gemmaverse’, where the community builds upon Gemma, citing an example of fine-tuning for the Quechua language.
00:02:41 · Early Applications and Agentic Use — Omar Sanseviero
- Developers are using Gemma as an agentic model for function calling and running it on hardware-constrained devices like Raspberry Pi.
00:03:41 · Hybrid Inference and Routing — Omar Sanseviero
- Omar discusses using small local models like Gemma as routers to decide whether to handle a task locally or send it to a larger cloud-based model.
00:05:37 · Use Cases for Local and Open Models — Omar Sanseviero
- Key use cases include privacy-first applications (healthcare), sovereign AI, offline scenarios, and fine-tuning for specialized domains.
00:07:13 · The Shift to Apache 2.0 License — Omar Sanseviero
- Based on community feedback to reduce adoption friction, Google moved Gemma to the well-understood Apache 2.0 license.
00:08:44 · On-Device Agentic Capabilities — Omar Sanseviero
- Small agentic models are capable of on-device tool use, such as controlling phone functions, as demonstrated in the Android AI Edge Gallery.

Products Announced (1)

00:00:05 · Gemma (Recently Launched)
- Open-access family of models (2B to 31B parameters) · Multimodal (text, audio, video, image) and multilingual capabilities · Designed for on-device and developer-friendly use
- Available for download, Apache 2.0 license

Customer Stories (1)

00:04:35 · Cactus Compute — A YC startup developing ‘hybrid inference’ to route prompts between local and cloud-based models.

Benchmarks Shown (1)

00:00:27 · Gemma Launch Adoption: Over 40 million
- In the first three weeks since launch.

Demos (1)

00:09:17 ✓ · Android AI Edge Gallery On-Device Agent — Omar Sanseviero
- Omar described a demo where Gemma runs locally on an Android phone, acting as an agent to perform tasks like controlling the flashlight or drafting an email.

Notable Quotes (4)

00:00:18 — Omar Sanseviero:

This has been our largest open model release ever.
00:01:08 — Omar Sanseviero:

The largest ones are like super good, like the most intelligent per parameter, per watt that you can get.
00:02:08 — Omar Sanseviero:

It’s not just a model for the US, it’s a model for the whole world.
00:04:36 — Omar Sanseviero:

Cactus Compute is doing this thing which is called hybrid inference.

Visual Signals

On-screen (4)

00:00:24 · Omar Sanseviero Lead AI Developer Experience, Google DeepMind
- Identifies the speaker and their role.
00:01:24 · Muhammad Farooq Google Developer Expert, Cloud & AI
- Identifies the interviewer and their role.
00:10:15 · Google logo
- Brand identification at the end of the video.
00:10:18 · Google Cloud Next '26 logo
- Event branding at the end of the video.

Stage (1)

00:00:00 · Two speakers are seated at a desk in a podcast-style setup on a busy conference floor. The desk is branded ‘Google Cloud Next’ and the microphone boxes are branded ‘Google Cloud Next ‘26’.

Key Topics

Gemma · Google DeepMind · Open-Source AI · Large Language Models (LLMs) · On-Device AI · Multimodal Models · Agentic AI · Fine-Tuning · AI Developer Experience · Hybrid Inference · AI Licensing · Apache 2.0 · Google Cloud Next · AI Ecosystem · Multilingual AI

Takeaways

Gemma is a family of open-access models from Google, designed to be developer-friendly and run efficiently on consumer hardware.
The models are multimodal (supporting text, audio, video, images) and multilingual (trained on over 140 languages), aiming for global accessibility.
The ‘Gemmaverse’ community is actively fine-tuning Gemma for specialized tasks, demonstrating the power of open models for niche applications.
Gemma models possess agentic capabilities for tool use and function calling, making them suitable for on-device assistants and hybrid inference routing.
Key use cases for on-device models like Gemma include privacy-sensitive applications, sovereign AI, and offline scenarios without internet access.
Google released Gemma under the permissive Apache 2.0 license to reduce friction and encourage wider commercial and enterprise adoption.
The launch was highly successful, achieving over 40 million downloads in the first three weeks, indicating strong developer interest.