DHP19: Dynamic Vision Sensor 3D Human Pose Dataset

Event: CVPR Workshop on Event-based Vision and Smart Cameras, 2019 · Duration: 3 min · ▶ Watch on YouTube

Abstract

This presentation introduces DHP19, the first event-based dataset specifically designed for 3D human pose estimation. It addresses the limitations of traditional convolutional neural networks (CNNs) in power-constrained and real-time applications due to their high computational demands. The work proposes exploiting the properties of Dynamic Vision Sensors (DVS) for more efficient human pose estimation. A CNN is trained on event-based data from multiple camera views to predict 2D joint positions, which are then triangulated to obtain 3D pose, demonstrating promising results for real-time and power-constrained scenarios.

Speakers

  • Enrico Calabrese — Institute of Neuroinformatics, Univ. and ETH Zurich

Talks (1)

  • 00:00:00 — Enrico Calabrese: DHP19: Dynamic Vision Sensor 3D Human Pose Dataset
    • Introduction of DHP19, the first event-based dataset for 3D human pose estimation, and a CNN-based method for efficient 2D/3D pose prediction using DVS cameras.

Key Takeaways

  • DHP19 is the first event-based dataset for 3D human pose estimation, offering data from 17 subjects performing 33 movements recorded by 4 DVS cameras, along with 3D joint positions from a Vicon system.
  • The proposed method uses a convolutional network trained on accumulated DVS events from two camera views to predict 2D human joint positions.
  • 3D human pose is recovered by triangulating the 2D joint predictions from the two camera views.
  • The approach aims for high predictive accuracy with low model complexity, making it suitable for real-time and power-constrained IoT applications.
  • Results show an average 3D error of 8 cm, comparable to state-of-the-art multi-view methods using few cameras, with better performance observed for whole-body movements.

Methods / Models / Datasets Mentioned

  • DHP19
  • Convolutional Networks
  • PAF (VGG19 backbone)
  • Vicon motion capture system
  • Triangulation

Topics

Human pose estimation · Dynamic Vision Sensors (DVS) · Event-based cameras · 3D pose estimation · Dataset · Convolutional Neural Networks (CNN) · Real-time applications · Power-constrained applications · Multi-view systems · Triangulation


Notes

Open for commentary — connections to other work, critiques, follow-up reading.