LatinX in Computer Vision (LXCV) at CVPR 2024 Workshop

Event: CVPR 2024 Workshop · Duration: 185 min · ▶ Watch on YouTube

Abstract

The LatinX in Computer Vision (LXCV) workshop at CVPR 2024 showcased cutting-edge research from Latin American scientists, focusing on challenges and opportunities in computer vision. Key themes included developing frugal and interpretable AI architectures, enhancing robustness against adversarial attacks, and leveraging spatio-temporal data for emotion recognition. A significant portion of the workshop addressed the unique challenges of autonomous driving in emerging economies, emphasizing the need for diverse real-world data and robust models. Additionally, research explored novel approaches for seismic data analysis and material segmentation using spectral information, alongside a critical examination of traditional neural network designs. The event highlighted the importance of interdisciplinary collaboration and the potential for Latin America to contribute significantly to the global AI landscape.

Speakers

Octavia Camps — Professor of Electrical and Computer Engineering, Northeastern University
Cleber Zanchettin — Universidade Federal de Pernambuco, Brazil
Ivan Reyes — CINVESTAV, Mexico
Maria Luisa Lima — Universidade Federal de Pernambuco, Brazil
Arturo Deza — Co-Founder & CEO @ Artificio, Assistant Professor in Computer Science @ UTEC
Luiz Schirmer — Unisinos, Brazil
Ramon Izquierdo Cordova — University of Bristol, United Kingdom
Fabian Perez — Universidad Industrial de Santander, Colombia

Talks (8)

07:03:00 — Octavia Camps: Frugal, Interpretable, Dynamics-Inspired Architectures for Sequence Analysis
- This talk explores the importance of sequences in a dynamic world and introduces dynamics-based representations for deep learning, aiming for physically interpretable, rich, simple, and frugal models, exemplified by DYAN for video prediction and cross-view action recognition.
08:25:00 — Cleber Zanchettin: An End-to-End Approach for Handwriting Recognition: From Handwritten Text Lines to Complete Pages
- This presentation details an end-to-end approach for offline handwritten text recognition (HTR) that processes entire pages, utilizing a fully convolutional network with a self-attention module and octave convolutions to achieve competitive results on historical documents while improving efficiency and reducing memory consumption.
08:50:00 — Ivan Reyes: Enhancing Image Classification Robustness through Adversarial Sampling with Delta Data Augmentation (DDA)
- This talk addresses the challenge of enhancing robustness in deep neural networks against adversarial and natural image alterations by introducing Delta Data Augmentation (DDA), a method that uses pre-trained robust models to generate and incorporate adversarial perturbations into new training datasets, thereby reducing the performance gap between normal and robust accuracy without high computational costs.
09:20:00 — Maria Luisa Lima: ST-Gait++: Leveraging spatio-temporal convolutions for gait-based emotion recognition on videos
- This presentation investigates gait-based emotion recognition from videos, emphasizing its role as a non-verbal cue, and introduces ST-Gait++, an architecture built upon spatio-temporal graph convolutional networks (ST-GCN++) for skeletal trajectory classification, achieving state-of-the-art performance on the E-Gait dataset with improved accuracy and faster convergence.
09:50:00 — Arturo Deza: On the Quest of Building the World’s Hardest Benchmark for Autonomous Driving: An Opportunity for Latin America
- This talk explores the current state and challenges of self-driving cars, focusing on AI’s limitations with out-of-distribution data and adversarial examples, and proposes leveraging diverse real-world data from emerging economies like Peru to build robust, explainable AI models for autonomous driving, introducing the ‘Robusto’ platform for benchmarking and auditing computer vision models.
10:05:00 — Luiz Schirmer: High-Resolution Detection of Earth Structural Heterogeneities from Seismic Amplitudes using Convolutional Neural Networks with Attention layers
- This presentation tackles the problem of detecting earth structural heterogeneities (faults and fractures) from seismic data, crucial for petroleum exploration, by proposing a fully convolutional neural network (FCNN) architecture with a self-attention module that leverages synthetic data and transfer learning to real-world seismic data, outperforming previous U-Net based and graph convolutional network models with fewer parameters.
10:30:00 — Ramon Izquierdo Cordova: The Myth of the Pyramid: challenging the convolutional network pyramidal design
- This talk challenges the conventional pyramidal design in CNNs, which assumes increasing filters and decreasing feature map size is optimal, by proposing a method to redistribute filters using basic templates to explore novel filter distributions, demonstrating that simpler models with optimized filter distributions can achieve higher accuracy with fewer parameters and reduced computational resources.
11:00:00 — Fabian Perez: Beyond Appearances: Material Segmentation with Embedded Spectral Information from RGB-D imagery
- This presentation addresses the limitations of classical material segmentation methods that rely solely on RGB images by introducing spectral information as a key component for material discrimination, proposing a deep learning framework that transforms RGB-D images into spectral reconstructed features for precise material segmentation, and validating its effectiveness with consumer devices like the iPad Pro.

Key Takeaways

Developing frugal, interpretable, and dynamics-inspired AI architectures is crucial for efficient and robust computer vision systems, especially in data-scarce environments.
Addressing the challenges of autonomous driving in emerging economies requires leveraging diverse, real-world out-of-distribution data and developing robust models that can handle complex and informal scenarios.
Novel data augmentation techniques, such as Delta Data Augmentation (DDA), can significantly enhance model robustness against adversarial attacks and natural image alterations without incurring high computational costs.
Rethinking traditional neural network designs, like the pyramidal filter distribution in CNNs, can lead to more efficient and accurate models with fewer parameters, providing benefits across various domains including image and audio classification.
Integrating spectral information into material segmentation and developing models that align with human visual cortex representations can improve perception capabilities and lead to more explainable and robust AI systems.

Methods / Models / Datasets Mentioned

DYAN
LSTMs
RNNs
Transformers
Johansson Experiments
Kalman filtering
OpenPose
FISTA
Gumbel Softmax
FCN
Octave Convolution
Self-Attention
DAN (Document Attention Network)
Faster DAN
DANCER
DANCER-Max
PGD (Projected Gradient Descent)
WideResNet-50
ResNet18
RandAugment
AutoAugment
AugMix
DDA
ST-GCN++
E-Gait dataset
U-Net
GCN
Transformer dual U-Net
YOLO V8
DETECTRON-2
EVA-Retina Net
ResNet50
VGG19
MobileNet
MNASNet
NASNet
NASBench-101
CrossViT
RGB-D
LiDAR
iPad Pro
LIB-HSI dataset

Topics

Frugal and Interpretable AI · Adversarial Robustness · Data Augmentation · Gait-based Emotion Recognition · Autonomous Driving Challenges in Emerging Economies · Out-of-Distribution Data · Seismic Data Analysis · Material Segmentation · Neural Network Architecture Design · Spatio-temporal Convolutions · Human Visual Cortex Modeling · Data Commercialization and Regulations

Notes

Open for commentary — connections to other work, critiques, follow-up reading.