Learning Spatiotemporal Filters to Track Visual Saliency
Event: CVPR 2025 · Duration: 19 min · ▶ Watch on YouTube
Abstract
The presentation explores visual saliency, its applications, and the challenges of tracking it using event-based cameras. It proposes an unsupervised learning model that utilizes spatiotemporal filters, learned through clustering and decision trees, to identify and follow salient objects. The model incorporates lifelong learning principles to manage information over time, aiming for robustness and efficiency in real-time applications, particularly for on-chip resources. Experimental results on Streetcar and Motorway datasets demonstrate the model’s ability to adapt to different environments and distinguish between obvious and nuanced features, aligning with human observer behavior.
Speakers
- Khaled Aboumerhi — ECE Ph.D. Candidate, Johns Hopkins
- Ralph Etienne-Cummings — Professor of ECE, Johns Hopkins
Talks (1)
- 00:00:00 — Khaled Aboumerhi: Learning Spatiotemporal Filters to Track Visual Saliency
- This presentation introduces an unsupervised visual saliency model that leverages event-based camera data and lifelong learning principles to dynamically learn spatiotemporal filters for tracking salient features in complex environments.
Key Takeaways
- The proposed model effectively learns spatiotemporal filters from event-based camera data using unsupervised clustering and decision trees, enabling dynamic tracking of visual saliency.
- Lifelong learning principles are crucial for managing large event-based datasets, ensuring consistency, preventing catastrophic forgetting, and maintaining space efficiency for real-time and online applications.
- The model’s ability to differentiate between obvious and nuanced salient features aligns with human visual attention, suggesting its potential for more sophisticated robotic and computer vision systems.
- Future work involves acquiring ground-truth spike-based saliency datasets using closed-environment eye-tracking devices (like HTC Vive or Google HoloLens) to validate and compare visual saliency algorithms more accurately.
- Optimizing data processing by breaking down event streams into time blocks and applying learned filters during intermittent latent phases can improve accuracy and processing speed.
Methods / Models / Datasets Mentioned
Prophesee.ai DatasetATIS CameraAPS eventsHOTS (Hierarchy of Event-Based Time Surfaces)Decision TreesRandom ForestHTC ViveGoogle HoloLens
Topics
Visual Saliency · Spatiotemporal Filters · Event-Based Cameras · Unsupervised Learning · Lifelong Learning · Data Management · Eye-Tracking · Computer Vision · Robotics · Real-time Processing
Notes
Open for commentary — connections to other work, critiques, follow-up reading.