Novel Hardware for Spatial AI

Event: CVPR 2019 Workshop · Duration: 28 min · ▶ Watch on YouTube

Abstract

This talk explores the evolution of SLAM (Simultaneous Localization and Mapping) into Spatial AI, emphasizing the need for robust real-time systems that can understand and interact with their environment. The speaker highlights the current maturity of sparse/semi-dense reconstruction and the rapid advancements in dense and semantic mapping. Achieving breakthrough Spatial AI products, such as general-purpose home robots or lightweight augmented reality glasses, necessitates significant innovations in novel hardware and efficient scene representations. The presentation delves into the potential of event cameras and graph-based processing architectures as key enablers for these future intelligent systems.

Speakers

Andrew Davison — Dyson Robotics Laboratory, Department of Computing, Imperial College London

Talks (1)

00:00:00 — Andrew Davison: Novel Hardware for Spatial AI
- A discussion on the evolution of SLAM into Spatial AI, highlighting the need for novel hardware and representations to enable future AI products like home robots and AR glasses, focusing on event cameras and graph-based processing.

Key Takeaways

Spatial AI represents the next frontier beyond traditional SLAM, aiming for embodied devices that can intelligently interact with their environment by building persistent and understandable 3D scene representations.
Achieving future Spatial AI products like mass-market home robots or augmented reality glasses requires closing a significant gap between current capabilities and desired performance, particularly in terms of precision, low-latency, dense/semantic mapping, and long-term scene understanding on low-cost hardware.
Novel hardware, such as event cameras, offers advantages in efficiency and latency for certain tasks, providing rich data streams that can be processed event-by-event to reconstruct scene intensity and 3D structure.
Efficient data representations, like learned low-dimensional codes for depth and semantics (e.g., SceneCode), are crucial for managing the vast amounts of information in complex scenes and enabling coherent multi-view fusion.
Graph-based processing architectures (e.g., Graphcore’s IPU) are emerging as a promising paradigm for Spatial AI, allowing algorithms to bring processing and local memory closer together, reducing data movement, and enabling efficient distributed computation for complex tasks like belief propagation and graph neural networks.

Methods / Models / Datasets Mentioned

Dyson/iRobot
ARKit/ARCore
Oculus/HoloLens
DJI/Skydio
SemanticFusion
ElasticFusion
CNN
DVS128
Simultaneous Mosaicing and Tracking with an Event Camera
EKF
3D Motion, Structure and Intensity from Event Data
SceneCode
Fusion++: Volumetric Object-Level SLAM
Mask-RCNN
KinectFusion
SLAMBench
PAMELA Project
SpiNNaker
Graphcore
IPU
IBM Truenorth
Brainchip
Gaussian Belief Propagation
AlexNet

Topics

Spatial AI · SLAM · Novel Hardware · Event Cameras · Graph Processing · Semantic Mapping · Augmented Reality · Robotics · Real-time Systems · Scene Representation

Notes

Open for commentary — connections to other work, critiques, follow-up reading.