DHP19: Dynamic Vision Sensor 3D Human Pose Dataset
Event: CVPR Workshop on Event-based Vision and Smart Cameras, 2019 · Duration: 3 min · ▶ Watch on YouTube
Abstract
This presentation introduces DHP19, the first event-based dataset specifically designed for 3D human pose estimation. It addresses the limitations of traditional convolutional neural networks (CNNs) in power-constrained and real-time applications due to their high computational demands. The work proposes exploiting the properties of Dynamic Vision Sensors (DVS) for more efficient human pose estimation. A CNN is trained on event-based data from multiple camera views to predict 2D joint positions, which are then triangulated to obtain 3D pose, demonstrating promising results for real-time and power-constrained scenarios.
Speakers
- Enrico Calabrese — Institute of Neuroinformatics, Univ. and ETH Zurich
Talks (1)
- 00:00:00 — Enrico Calabrese: DHP19: Dynamic Vision Sensor 3D Human Pose Dataset
- Introduction of DHP19, the first event-based dataset for 3D human pose estimation, and a CNN-based method for efficient 2D/3D pose prediction using DVS cameras.
Key Takeaways
- DHP19 is the first event-based dataset for 3D human pose estimation, offering data from 17 subjects performing 33 movements recorded by 4 DVS cameras, along with 3D joint positions from a Vicon system.
- The proposed method uses a convolutional network trained on accumulated DVS events from two camera views to predict 2D human joint positions.
- 3D human pose is recovered by triangulating the 2D joint predictions from the two camera views.
- The approach aims for high predictive accuracy with low model complexity, making it suitable for real-time and power-constrained IoT applications.
- Results show an average 3D error of 8 cm, comparable to state-of-the-art multi-view methods using few cameras, with better performance observed for whole-body movements.
Methods / Models / Datasets Mentioned
DHP19Convolutional NetworksPAF (VGG19 backbone)Vicon motion capture systemTriangulation
Topics
Human pose estimation · Dynamic Vision Sensors (DVS) · Event-based cameras · 3D pose estimation · Dataset · Convolutional Neural Networks (CNN) · Real-time applications · Power-constrained applications · Multi-view systems · Triangulation
Notes
Open for commentary — connections to other work, critiques, follow-up reading.