Lifting Monocular Events to 3D Human Poses
Event: CVPR 2025 · Duration: 3 min · ▶ Watch on YouTube
Abstract
This paper presents the first events-only monocular approach for 3D human pose estimation (HPE). The methodology leverages marginal heatmaps, generated from event-based frames processed by a deep learning backbone, to triangulate 3D joint positions. A novel synthetic dataset, Event-Human3.6m, is introduced, derived from the standard Human3.6m dataset, to facilitate research in event-based HPE. Extensive ablation studies on DHP19 and Event-Human3.6m datasets demonstrate that the constant-count event representation outperforms the spatio-temporal voxel-grid, and ImageNet pretraining significantly enhances performance. The work also identifies static movements and occluded body parts as primary challenges for event-based pose estimation.
Speakers
- Gianluca Scarpellini — Istituto Italiano di Tecnologia (IIT) - PAVIS
- Pietro Morerio — Istituto Italiano di Tecnologia (IIT) - PAVIS
- Alessio Del Bue — Istituto Italiano di Tecnologia (IIT) - VGM
Talks (1)
- 00:00:00 — Gianluca Scarpellini: Lifting Monocular Events to 3D Human Poses
- A novel approach for 3D human pose estimation using only monocular event camera data, including a new synthetic dataset and ablation studies on event representations and backbones.
Key Takeaways
- Introduced the first events-only monocular approach for 3D Human Pose Estimation.
- Developed a novel synthetic dataset, Event-Human3.6m, for event-based HPE.
- Demonstrated that the constant-count event representation yields better results than spatio-temporal voxel-grid.
- Showed that ImageNet pretraining significantly improves the performance of event-based HPE models.
- Identified static movements and occluded body parts as key limitations for event-based 3D human pose estimation.
Methods / Models / Datasets Mentioned
DHP19Human3.6mEvent-Human3.6mResNet-34ResNet-50ImageNetConstant-countVoxel-gridStacked HourglassMPJPECalabrese et al. [5]Metha et al. [38]Kanazawa et al. [22]Nibali et al. [43]Pavlakos et al. [44]Luvizon et al. [33]Cheng et al. [9]
Topics
3D Human Pose Estimation · Event Cameras · Monocular Vision · Deep Learning · Synthetic Datasets · Marginal Heatmaps · Event-based Vision · Human Motion Analysis · ResNet · ImageNet Pretraining
Notes
Open for commentary — connections to other work, critiques, follow-up reading.