Human Motion Generation (HuMoGen) Workshop

Event: CVPR Workshop 2024 · Duration: 207 min · ▶ Watch on YouTube

Abstract

This workshop explores the latest advancements and open challenges in Human Motion Generation (HuMoGen). Speakers from academia and industry present diverse approaches to synthesizing realistic, controllable, and expressive human motion for applications ranging from video games and robotics to healthcare and virtual reality. Key themes include overcoming computational and memory constraints, achieving high-quality and diverse motion, developing robust data collection methods for embodied intelligence, and leveraging novel techniques like diffusion models and egocentric synthetic data generation to push the boundaries of human motion synthesis. The discussions highlight the importance of understanding indirect control, addressing the idiosyncratic nature of gestures, and developing unified frameworks for motion optimization and learning.

Speakers

Guy Tevet — Tel-Aviv University
Daniel Holden — Epic Games
Michael Neff — University of California, Davis
C. Karen Liu — Stanford University
Siyu Tang — ETH Zurich

Talks (5)

00:00:00 — Guy Tevet: Welcome to the first workshop on Human Motion Generation (HuMoGen)
- Introduction to the Human Motion Generation (HuMoGen) workshop, outlining the organizing team, scope, and agenda for the day, including speakers and accepted papers.
00:05:18 — Daniel Holden: Human Motion Generation for Video Games
- Discusses the constraints and motivations for human motion generation in video games, focusing on real-time, online neural network inference for character animation, and highlighting challenges in computational budgets, quality, and responsiveness.
00:15:15 — Michael Neff: The Challenge of Gesture Synthesis
- Explores the complexities of gesture synthesis, emphasizing the idiosyncratic, many-to-many mapping, and multimodal nature of human gestures, and presents a data-driven approach using adversarial loss based on gesture phases.
00:23:45 — C. Karen Liu: New Challenges in 3D Human Motion Synthesis
- Discusses new challenges in 3D human motion synthesis, focusing on collecting embodied data, predicting external forces with diffusion models, and generating egocentric synthetic data for tasks like human mesh recovery and AR mapping.
00:38:30 — Siyu Tang: Controllable Human Motion Synthesis
- Presents work on controllable human motion synthesis, introducing DART for real-time text-driven control and EgoGen for egocentric synthetic data generation, emphasizing generative, efficient, diverse, and controllable models for realistic virtual humans in simulators.

Key Takeaways

Human motion generation for video games faces significant challenges including strict computational/memory budgets, high quality demands, and the need for highly responsive and controllable character animation.
Gesture synthesis is a complex problem due to the idiosyncratic nature of human gestures, requiring robust multimodal data collection and methods to capture subtle nuances and semantic content.
Novel data collection systems like DexCap and EgoGen are crucial for acquiring embodied data in diverse, contextualized, and egocentric environments, enabling the training of more realistic and generalizable motion models.
Diffusion models offer a promising unified framework for various motion tasks (editing, in-betweening, denoising) and can be optimized using loss guidance or noise optimization to achieve controllable and high-quality motion synthesis.
Future work in human motion generation will focus on leveraging synthetic and real-world egocentric data, grounding models in conversational context, and integrating large language models (LLMs) to create more intelligent and interactive virtual humans.

Methods / Models / Datasets Mentioned

Local Motion Phases [Starke et al. 2020]
Motorica [Alexanderson et al. 2022]
Motion Matching [Holden et al. 2020]
Phase-Functioned Neural Networks [Holden et al. 2017]
Diffusion Noise Optimization (DNO)
DART architecture
Motion Primitive VAE
CLIP text encoder
CLIP visual encoder
Aesthetic Scoring Model
EDICCT Generative Process [Wallace et al. 2022]
PhysicsVAE [Won et al. SIGGRAPH 2022]
FlowMDM [Barquero et al. CVPR 2024]
DexCap
SLAM+IMU pose tracking
DiffIP motion reconstruction
EgoGen
Inverse Kinematics
Torque Inverse Dynamics
Muscle Reconstruction
DART (Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control)
EgoGen (An Egocentric Synthetic Data Generator)
HOOD [Grigorev et al. CVPR 2023]
PPO algorithm
PhysicsVAE [Won et al. SIGGRAPH 2022]
GAMMA [Zhang et al. CVPR 2022]

Topics

Human Motion Generation · Motion Synthesis · Character Animation · Video Games · Robotics · Healthcare · Virtual Reality · Embodied Intelligence · Data Collection · Diffusion Models · Generative Models · Controllable Motion · Real-time Inference · Motion Matching · Gesture Synthesis · Egocentric Perception · Dynamic Simulation · Data Sharing · Optimization

Notes

Open for commentary — connections to other work, critiques, follow-up reading.