Unsupervised Learning of Optical Flow and Camera Motion from Event Data

Event: CVPR 2019 · Duration: 22 min · ▶ Watch on YouTube

Abstract

This presentation introduces unsupervised learning techniques for event-based cameras, focusing on estimating optical flow, depth, and camera motion (egomotion) directly from event data. It highlights the advantages of event cameras, such as low latency and high dynamic range, over traditional cameras. The speaker discusses challenges in developing algorithms for event data, particularly the lack of photometric loss and the need for robust noise models. To address data scarcity for deep learning, the Multi Vehicle Stereo Event Camera Dataset (MVSEC) is introduced, providing high-quality, synchronized multi-sensor data with ground truth. The talk then details novel input representations for Convolutional Neural Networks (CNNs) using event data and explores self-supervised loss functions, including a grayscale-based loss and a ‘focus loss’ that leverages motion compensation to deblur event images, demonstrating improved performance in challenging scenarios.

Speakers

Alex Zihao Zhu — University of Pennsylvania
Kostas Daniilidis — University of Pennsylvania

Talks (1)

00:00:00 — Alex Zihao Zhu: Unsupervised Learning of Optical Flow and Camera Motion from Event Data
- This talk presents self-supervised and unsupervised learning frameworks for event-only optical flow, depth, and egomotion estimation, addressing challenges in event data representation and loss functions, and introduces the Multi Vehicle Stereo Event Camera Dataset (MVSEC).

Key Takeaways

Event cameras offer significant advantages over traditional cameras in low latency, high dynamic range, and low power consumption, especially for fast motions and challenging lighting conditions.
Developing algorithms for event data is challenging due to the asynchronous nature and lack of direct intensity information, necessitating new models for photometric loss and robust noise handling.
Self-supervised and unsupervised deep learning frameworks are crucial for event-based vision, as they mitigate the need for expensive labeled datasets by leveraging geometric constraints.
Novel input representations for CNNs, such as 3D event volumes with trilinear interpolation, and specialized loss functions like ‘focus loss’, enable effective processing of event data for optical flow, depth, and egomotion estimation.
The MVSEC dataset provides a valuable resource for event-based vision research, offering synchronized multi-sensor data with ground truth for diverse environments and conditions.

Methods / Models / Datasets Mentioned

Event-based feature tracking with probabilistic data association (ICRA 2017)
Event-based visual inertial odometry (CVPR 2017)
Realtime Time Synchronized Event-based Stereo (ECCV 2018)
Multi Vehicle Stereo Event Camera Dataset (MVSEC) (RA-L/ICRA 2018)
Deep Learning
Neural Networks
Convolutional Neural Networks (CNNs)
Encoder-decoder network
Photometric loss
Focus Loss
Motion Blur Loss
Charbonnier loss
Uniflow
Trilinear interpolation
Lidar odometry
Motion capture (Vicon/Qualisys)
Google Cartographer with loop closure
Census transform
Zhou, Tinghui, et al. "Unsupervised learning of depth and ego-motion from video." (CVPR 2017)
Mueggler, Anton, et al. "Event-based moving object detection and tracking." (IROS 2018)

Topics

Event-based cameras · Unsupervised learning · Self-supervised learning · Optical flow estimation · Depth estimation · Egomotion estimation · Deep learning for events · Data representation · Loss functions · Sensor fusion

Notes

Open for commentary — connections to other work, critiques, follow-up reading.