Unsupervised Learning of Optical Flow and Camera Motion from Event Data
Event: CVPR 2019 · Duration: 22 min · ▶ Watch on YouTube
Abstract
This presentation introduces unsupervised learning techniques for event-based cameras, focusing on estimating optical flow, depth, and camera motion (egomotion) directly from event data. It highlights the advantages of event cameras, such as low latency and high dynamic range, over traditional cameras. The speaker discusses challenges in developing algorithms for event data, particularly the lack of photometric loss and the need for robust noise models. To address data scarcity for deep learning, the Multi Vehicle Stereo Event Camera Dataset (MVSEC) is introduced, providing high-quality, synchronized multi-sensor data with ground truth. The talk then details novel input representations for Convolutional Neural Networks (CNNs) using event data and explores self-supervised loss functions, including a grayscale-based loss and a ‘focus loss’ that leverages motion compensation to deblur event images, demonstrating improved performance in challenging scenarios.
Speakers
- Alex Zihao Zhu — University of Pennsylvania
- Kostas Daniilidis — University of Pennsylvania
Talks (1)
- 00:00:00 — Alex Zihao Zhu: Unsupervised Learning of Optical Flow and Camera Motion from Event Data
- This talk presents self-supervised and unsupervised learning frameworks for event-only optical flow, depth, and egomotion estimation, addressing challenges in event data representation and loss functions, and introduces the Multi Vehicle Stereo Event Camera Dataset (MVSEC).
Key Takeaways
- Event cameras offer significant advantages over traditional cameras in low latency, high dynamic range, and low power consumption, especially for fast motions and challenging lighting conditions.
- Developing algorithms for event data is challenging due to the asynchronous nature and lack of direct intensity information, necessitating new models for photometric loss and robust noise handling.
- Self-supervised and unsupervised deep learning frameworks are crucial for event-based vision, as they mitigate the need for expensive labeled datasets by leveraging geometric constraints.
- Novel input representations for CNNs, such as 3D event volumes with trilinear interpolation, and specialized loss functions like ‘focus loss’, enable effective processing of event data for optical flow, depth, and egomotion estimation.
- The MVSEC dataset provides a valuable resource for event-based vision research, offering synchronized multi-sensor data with ground truth for diverse environments and conditions.
Methods / Models / Datasets Mentioned
Event-based feature tracking with probabilistic data association (ICRA 2017)Event-based visual inertial odometry (CVPR 2017)Realtime Time Synchronized Event-based Stereo (ECCV 2018)Multi Vehicle Stereo Event Camera Dataset (MVSEC) (RA-L/ICRA 2018)Deep LearningNeural NetworksConvolutional Neural Networks (CNNs)Encoder-decoder networkPhotometric lossFocus LossMotion Blur LossCharbonnier lossUniflowTrilinear interpolationLidar odometryMotion capture (Vicon/Qualisys)Google Cartographer with loop closureCensus transformZhou, Tinghui, et al. "Unsupervised learning of depth and ego-motion from video." (CVPR 2017)Mueggler, Anton, et al. "Event-based moving object detection and tracking." (IROS 2018)
Topics
Event-based cameras · Unsupervised learning · Self-supervised learning · Optical flow estimation · Depth estimation · Egomotion estimation · Deep learning for events · Data representation · Loss functions · Sensor fusion
Notes
Open for commentary — connections to other work, critiques, follow-up reading.