Running Gemma 4 on smartphones to Raspberry Pi

Year: 2026 · ▶ Watch on YouTube

Muhammad Farooq (Google Developer Expert, Cloud & AI) · Omar Sanseviero (Lead AI Developer Experience)

Switch language → zh

Segments (10)

  • 00:00:00 · Introduction — Muhammad Farooq
    • Muhammad introduces Omar from Google DeepMind to discuss the Gemma family of models.
  • 00:00:11 · Gemma Launch Success — Omar Sanseviero
    • Omar describes the Gemma launch as Google’s largest open model release ever, with over 40 million downloads in three weeks.
  • 00:00:33 · The Gemma Model Family — Omar Sanseviero
    • The models range from 2B to 31B parameters, designed to be developer-friendly and efficient enough to run on consumer hardware.
  • 00:01:17 · Model Capabilities — Omar Sanseviero
    • Gemma models are multimodal (audio, video, image) and multilingual (trained on over 140 languages), with different capabilities based on size.
  • 00:01:58 · The ‘Gemmaverse’ Ecosystem — Omar Sanseviero
    • Omar introduces the ‘Gemmaverse’, where the community builds upon Gemma, citing an example of fine-tuning for the Quechua language.
  • 00:02:41 · Early Applications and Agentic Use — Omar Sanseviero
    • Developers are using Gemma as an agentic model for function calling and running it on hardware-constrained devices like Raspberry Pi.
  • 00:03:41 · Hybrid Inference and Routing — Omar Sanseviero
    • Omar discusses using small local models like Gemma as routers to decide whether to handle a task locally or send it to a larger cloud-based model.
  • 00:05:37 · Use Cases for Local and Open Models — Omar Sanseviero
    • Key use cases include privacy-first applications (healthcare), sovereign AI, offline scenarios, and fine-tuning for specialized domains.
  • 00:07:13 · The Shift to Apache 2.0 License — Omar Sanseviero
    • Based on community feedback to reduce adoption friction, Google moved Gemma to the well-understood Apache 2.0 license.
  • 00:08:44 · On-Device Agentic Capabilities — Omar Sanseviero
    • Small agentic models are capable of on-device tool use, such as controlling phone functions, as demonstrated in the Android AI Edge Gallery.

Products Announced (1)

  • 00:00:05 · Gemma (Recently Launched)
    • Open-access family of models (2B to 31B parameters) · Multimodal (text, audio, video, image) and multilingual capabilities · Designed for on-device and developer-friendly use
    • Available for download, Apache 2.0 license

Customer Stories (1)

  • 00:04:35 · Cactus Compute — A YC startup developing ‘hybrid inference’ to route prompts between local and cloud-based models.

Benchmarks Shown (1)

  • 00:00:27 · Gemma Launch Adoption: Over 40 million
    • In the first three weeks since launch.

Demos (1)

  • 00:09:17 ✓ · Android AI Edge Gallery On-Device Agent — Omar Sanseviero
    • Omar described a demo where Gemma runs locally on an Android phone, acting as an agent to perform tasks like controlling the flashlight or drafting an email.

Notable Quotes (4)

  • 00:00:18 — Omar Sanseviero:

    This has been our largest open model release ever.

  • 00:01:08 — Omar Sanseviero:

    The largest ones are like super good, like the most intelligent per parameter, per watt that you can get.

  • 00:02:08 — Omar Sanseviero:

    It’s not just a model for the US, it’s a model for the whole world.

  • 00:04:36 — Omar Sanseviero:

    Cactus Compute is doing this thing which is called hybrid inference.

Visual Signals

On-screen (4)

  • 00:00:24 · Omar Sanseviero Lead AI Developer Experience, Google DeepMind
    • Identifies the speaker and their role.
  • 00:01:24 · Muhammad Farooq Google Developer Expert, Cloud & AI
    • Identifies the interviewer and their role.
  • 00:10:15 · Google logo
    • Brand identification at the end of the video.
  • 00:10:18 · Google Cloud Next '26 logo
    • Event branding at the end of the video.

Stage (1)

  • 00:00:00 · Two speakers are seated at a desk in a podcast-style setup on a busy conference floor. The desk is branded ‘Google Cloud Next’ and the microphone boxes are branded ‘Google Cloud Next ‘26’.

Key Topics

Gemma · Google DeepMind · Open-Source AI · Large Language Models (LLMs) · On-Device AI · Multimodal Models · Agentic AI · Fine-Tuning · AI Developer Experience · Hybrid Inference · AI Licensing · Apache 2.0 · Google Cloud Next · AI Ecosystem · Multilingual AI

Takeaways

  • Gemma is a family of open-access models from Google, designed to be developer-friendly and run efficiently on consumer hardware.
  • The models are multimodal (supporting text, audio, video, images) and multilingual (trained on over 140 languages), aiming for global accessibility.
  • The ‘Gemmaverse’ community is actively fine-tuning Gemma for specialized tasks, demonstrating the power of open models for niche applications.
  • Gemma models possess agentic capabilities for tool use and function calling, making them suitable for on-device assistants and hybrid inference routing.
  • Key use cases for on-device models like Gemma include privacy-sensitive applications, sovereign AI, and offline scenarios without internet access.
  • Google released Gemma under the permissive Apache 2.0 license to reduce friction and encourage wider commercial and enterprise adoption.
  • The launch was highly successful, achieving over 40 million downloads in the first three weeks, indicating strong developer interest.