MineDojo: Framework for Generally Capable Agents & Voyager: An Open-Ended Embodied Agent with Large Language Models
Event: CVPR 2025 · Duration: 32 min · ▶ Watch on YouTube
Abstract
This presentation introduces MineDojo, an open-source framework designed for developing generally capable AI agents within the open-ended Minecraft environment. MineDojo offers a versatile simulator, over 3000 diverse tasks (both programmatic and creative), and an extensive internet-scale knowledge base compiled from YouTube videos, Minecraft wikis, and Reddit discussions. A key component is MineCLIP, a contrastive video-language foundation model that learns reward functions from time-aligned video and text. Building on this foundation, the talk then presents Voyager, an LLM-powered lifelong learning agent. Voyager utilizes GPT-4 to generate JavaScript code for agent control, employing an iterative prompting mechanism with execution feedback and self-reflection for program refinement. It also features a persistent skill library and an automatic curriculum that proposes novel tasks, enabling continuous exploration and skill acquisition in Minecraft.
Speakers
- Guanzhi Wang — NVIDIA, Caltech
- Yuqi Xie — NVIDIA, UT Austin
- Yunfan Jiang — NVIDIA, Stanford
- Ajay Mandlekar — NVIDIA
- Chaowei Xiao — NVIDIA, ASU
- Yuke Zhu — UT Austin
- Linxi “Jim” Fan — NVIDIA
- Anima Anandkumar — NVIDIA, Caltech
Talks (2)
- 00:00:00 — Guanzhi Wang et al.: MINEDOJO Framework for Generally Capable Agents
- MineDojo is an open-source framework for AI agents in Minecraft, featuring an open-ended environment, internet-scale knowledge base, 3000+ tasks, and MineCLIP for learning reward functions from multimodal data.
- 01:53:34 — Guanzhi Wang et al.: Voyager: An Open-Ended Embodied Agent with Large Language Models
- Voyager is an LLM-powered lifelong learning agent for Minecraft that uses GPT-4 to generate and refine JavaScript code for actions, maintains a persistent skill library, and employs an automatic curriculum for continuous exploration and skill acquisition.
Key Takeaways
- MineDojo provides a comprehensive platform for embodied AI research in Minecraft, offering a rich environment and vast multimodal datasets.
- Voyager demonstrates the potential of LLMs (specifically GPT-4) to create agents that can learn continuously and perform complex, open-ended tasks by generating and refining code.
- Iterative prompting with various feedback types (execution errors, environment state, self-reflection) is crucial for robust code generation and agent improvement.
- The concept of a persistent skill library allows agents to build upon previously acquired knowledge, enabling the development of increasingly complex behaviors.
- Automatic curriculum generation, driven by maximizing novel item acquisition, facilitates autonomous exploration and discovery of new capabilities in dynamic environments.
Methods / Models / Datasets Mentioned
MineDojoVoyagerMinecraftMineCLIPOpenAI GPTGPT-4MineflayerReActReflexionAutoGPT
Topics
Embodied AI · Large Language Models (LLMs) · Minecraft · Open-ended environments · Lifelong learning · Automatic curriculum · Skill acquisition · Multimodal learning · Self-reflection · Code generation · Reinforcement Learning
Notes
Open for commentary — connections to other work, critiques, follow-up reading.