Events-to-Video: Bringing Modern Computer Vision to Event Cameras

Event: CVPR 2019 · Duration: 9 min · ▶ Watch on YouTube

Abstract

This talk presents “Events-to-Video,” a method that leverages recurrent neural networks to reconstruct high-quality, high-framerate, and high dynamic range videos solely from event camera data. By training the network on synthetic data generated from a custom event camera simulator (ESIM) and real images, the approach demonstrates excellent generalization to real-world event streams. The reconstructed videos are shown to be on par with conventional camera footage, while retaining the unique advantages of event cameras. Furthermore, the work explores the applicability of these reconstructions to various downstream computer vision tasks, including object detection, monocular depth estimation, and visual odometry, using off-the-shelf algorithms without specific retraining on event data.

Speakers

  • Henri Rebecq — University of Zurich & ETH Zurich

Talks (9)

  • 00:00:00 — Henri Rebecq: Events-to-Video: Bringing Modern Computer Vision to Event Cameras
    • Introduction to the paper’s goal of converting event camera data into high-quality video and the team behind the work.
  • 00:13:00Henri Rebecq: Can we turn event data into a high quality video?
    • Posing the central question of the research and immediately answering it affirmatively, setting the stage for the method.
  • 00:44:00Henri Rebecq: Our method
    • Explanation of the recurrent neural network architecture, specifically a U-Net, that processes sequences of event tensors to generate image reconstructions and update its internal state.
  • 01:31:00Henri Rebecq: Our method: Training Data
    • Details on how synthetic training data (aligned event-video pairs) is generated using a 3D virtual world, MS-COCO images, and a custom event camera simulator (ESIM), demonstrating its generalization to real events.
  • 02:38:00Henri Rebecq: High Framerate Video
    • Demonstration of the system’s ability to generate high-framerate video from event data, showcasing a bullet hitting a gnome and a water balloon popping, highlighting details invisible to conventional cameras.
  • 04:18:00Henri Rebecq: HDR Video
    • Examples illustrating the high dynamic range capabilities of event camera reconstructions, comparing a selfie and a driving scene with a phone camera, showing no saturation or motion blur.
  • 05:29:00Henri Rebecq: Downstream applications
    • Discussion of how off-the-shelf computer vision algorithms can be applied to the reconstructed videos, exploring object detection, monocular depth estimation, and visual odometry.
  • 07:12:00Henri Rebecq: Conclusions
    • Summary of key findings: reconstructed videos match conventional camera quality with added benefits of high framerate and HDR, Sim2Real transfer works, and off-the-shelf algorithms are applicable.
  • 08:47:00Henri Rebecq: Live demo at the workshop!
    • Announcement of a live demo and availability of reconstruction code, pretrained models, event datasets, and the ESIM simulator.

Key Takeaways

  • High-quality videos can be reconstructed from event camera data, matching or exceeding conventional camera performance in terms of framerate and dynamic range.
  • The Sim2Real transfer paradigm is effective for event-based vision, allowing models trained on synthetic data to generalize well to real event streams.
  • Off-the-shelf computer vision algorithms (e.g., YOLOv3, MegaDepth, VINS-Mono) can be directly applied to these event-based video reconstructions, opening new avenues for leveraging existing CV research.
  • The increasing maturity and quality of event sensors (e.g., Samsung DVS, CelePixel) promise exciting future applications for event-based vision.

Methods / Models / Datasets Mentioned

  • U-Net
  • ESIM (Event Camera Simulator)
  • MS-COCO
  • YOLOv3
  • MegaDepth
  • VINS-Mono

Topics

Event cameras · Video reconstruction · High framerate · High dynamic range (HDR) · Recurrent neural networks · U-Net · Sim2Real transfer · Synthetic data generation · Computer vision applications · Object detection · Monocular depth · Visual odometry


Notes

Open for commentary — connections to other work, critiques, follow-up reading.