Lifting Monocular Events to 3D Human Poses

Event: CVPR 2025 · Duration: 3 min · ▶ Watch on YouTube

Abstract

This paper presents the first events-only monocular approach for 3D human pose estimation (HPE). The methodology leverages marginal heatmaps, generated from event-based frames processed by a deep learning backbone, to triangulate 3D joint positions. A novel synthetic dataset, Event-Human3.6m, is introduced, derived from the standard Human3.6m dataset, to facilitate research in event-based HPE. Extensive ablation studies on DHP19 and Event-Human3.6m datasets demonstrate that the constant-count event representation outperforms the spatio-temporal voxel-grid, and ImageNet pretraining significantly enhances performance. The work also identifies static movements and occluded body parts as primary challenges for event-based pose estimation.

Speakers

  • Gianluca Scarpellini — Istituto Italiano di Tecnologia (IIT) - PAVIS
  • Pietro Morerio — Istituto Italiano di Tecnologia (IIT) - PAVIS
  • Alessio Del Bue — Istituto Italiano di Tecnologia (IIT) - VGM

Talks (1)

  • 00:00:00 — Gianluca Scarpellini: Lifting Monocular Events to 3D Human Poses
    • A novel approach for 3D human pose estimation using only monocular event camera data, including a new synthetic dataset and ablation studies on event representations and backbones.

Key Takeaways

  • Introduced the first events-only monocular approach for 3D Human Pose Estimation.
  • Developed a novel synthetic dataset, Event-Human3.6m, for event-based HPE.
  • Demonstrated that the constant-count event representation yields better results than spatio-temporal voxel-grid.
  • Showed that ImageNet pretraining significantly improves the performance of event-based HPE models.
  • Identified static movements and occluded body parts as key limitations for event-based 3D human pose estimation.

Methods / Models / Datasets Mentioned

  • DHP19
  • Human3.6m
  • Event-Human3.6m
  • ResNet-34
  • ResNet-50
  • ImageNet
  • Constant-count
  • Voxel-grid
  • Stacked Hourglass
  • MPJPE
  • Calabrese et al. [5]
  • Metha et al. [38]
  • Kanazawa et al. [22]
  • Nibali et al. [43]
  • Pavlakos et al. [44]
  • Luvizon et al. [33]
  • Cheng et al. [9]

Topics

3D Human Pose Estimation · Event Cameras · Monocular Vision · Deep Learning · Synthetic Datasets · Marginal Heatmaps · Event-based Vision · Human Motion Analysis · ResNet · ImageNet Pretraining


Notes

Open for commentary — connections to other work, critiques, follow-up reading.