MineDojo: Framework for Generally Capable Agents & Voyager: An Open-Ended Embodied Agent with Large Language Models

Event: CVPR 2025 · Duration: 32 min · ▶ Watch on YouTube

Abstract

This presentation introduces MineDojo, an open-source framework designed for developing generally capable AI agents within the open-ended Minecraft environment. MineDojo offers a versatile simulator, over 3000 diverse tasks (both programmatic and creative), and an extensive internet-scale knowledge base compiled from YouTube videos, Minecraft wikis, and Reddit discussions. A key component is MineCLIP, a contrastive video-language foundation model that learns reward functions from time-aligned video and text. Building on this foundation, the talk then presents Voyager, an LLM-powered lifelong learning agent. Voyager utilizes GPT-4 to generate JavaScript code for agent control, employing an iterative prompting mechanism with execution feedback and self-reflection for program refinement. It also features a persistent skill library and an automatic curriculum that proposes novel tasks, enabling continuous exploration and skill acquisition in Minecraft.

Speakers

  • Guanzhi Wang — NVIDIA, Caltech
  • Yuqi Xie — NVIDIA, UT Austin
  • Yunfan Jiang — NVIDIA, Stanford
  • Ajay Mandlekar — NVIDIA
  • Chaowei Xiao — NVIDIA, ASU
  • Yuke Zhu — UT Austin
  • Linxi “Jim” Fan — NVIDIA
  • Anima Anandkumar — NVIDIA, Caltech

Talks (2)

  • 00:00:00 — Guanzhi Wang et al.: MINEDOJO Framework for Generally Capable Agents
    • MineDojo is an open-source framework for AI agents in Minecraft, featuring an open-ended environment, internet-scale knowledge base, 3000+ tasks, and MineCLIP for learning reward functions from multimodal data.
  • 01:53:34Guanzhi Wang et al.: Voyager: An Open-Ended Embodied Agent with Large Language Models
    • Voyager is an LLM-powered lifelong learning agent for Minecraft that uses GPT-4 to generate and refine JavaScript code for actions, maintains a persistent skill library, and employs an automatic curriculum for continuous exploration and skill acquisition.

Key Takeaways

  • MineDojo provides a comprehensive platform for embodied AI research in Minecraft, offering a rich environment and vast multimodal datasets.
  • Voyager demonstrates the potential of LLMs (specifically GPT-4) to create agents that can learn continuously and perform complex, open-ended tasks by generating and refining code.
  • Iterative prompting with various feedback types (execution errors, environment state, self-reflection) is crucial for robust code generation and agent improvement.
  • The concept of a persistent skill library allows agents to build upon previously acquired knowledge, enabling the development of increasingly complex behaviors.
  • Automatic curriculum generation, driven by maximizing novel item acquisition, facilitates autonomous exploration and discovery of new capabilities in dynamic environments.

Methods / Models / Datasets Mentioned

  • MineDojo
  • Voyager
  • Minecraft
  • MineCLIP
  • OpenAI GPT
  • GPT-4
  • Mineflayer
  • ReAct
  • Reflexion
  • AutoGPT

Topics

Embodied AI · Large Language Models (LLMs) · Minecraft · Open-ended environments · Lifelong learning · Automatic curriculum · Skill acquisition · Multimodal learning · Self-reflection · Code generation · Reinforcement Learning


Notes

Open for commentary — connections to other work, critiques, follow-up reading.