DHP19: Dynamic Vision Sensor 3D Human Pose Dataset

Event: CVPR Workshop on Event-based Vision and Smart Cameras, 2019 · Duration: 3 min · ▶ Watch on YouTube

Abstract

This presentation introduces DHP19, the first event-based dataset specifically designed for 3D human pose estimation. It addresses the limitations of traditional convolutional neural networks (CNNs) in power-constrained and real-time applications due to their high computational demands. The work proposes exploiting the properties of Dynamic Vision Sensors (DVS) for more efficient human pose estimation. A CNN is trained on event-based data from multiple camera views to predict 2D joint positions, which are then triangulated to obtain 3D pose, demonstrating promising results for real-time and power-constrained scenarios.

Speakers

Enrico Calabrese — Institute of Neuroinformatics, Univ. and ETH Zurich

Talks (1)

00:00:00 — Enrico Calabrese: DHP19: Dynamic Vision Sensor 3D Human Pose Dataset
- Introduction of DHP19, the first event-based dataset for 3D human pose estimation, and a CNN-based method for efficient 2D/3D pose prediction using DVS cameras.

Key Takeaways

DHP19 is the first event-based dataset for 3D human pose estimation, offering data from 17 subjects performing 33 movements recorded by 4 DVS cameras, along with 3D joint positions from a Vicon system.
The proposed method uses a convolutional network trained on accumulated DVS events from two camera views to predict 2D human joint positions.
3D human pose is recovered by triangulating the 2D joint predictions from the two camera views.
The approach aims for high predictive accuracy with low model complexity, making it suitable for real-time and power-constrained IoT applications.
Results show an average 3D error of 8 cm, comparable to state-of-the-art multi-view methods using few cameras, with better performance observed for whole-body movements.

Methods / Models / Datasets Mentioned

DHP19
Convolutional Networks
PAF (VGG19 backbone)
Vicon motion capture system
Triangulation

Topics

Human pose estimation · Dynamic Vision Sensors (DVS) · Event-based cameras · 3D pose estimation · Dataset · Convolutional Neural Networks (CNN) · Real-time applications · Power-constrained applications · Multi-view systems · Triangulation

Notes

Open for commentary — connections to other work, critiques, follow-up reading.