Scalable Real-Time Abnormal Event Detection

Event: CVPR Workshop 2024 · Duration: 398 min · ▶ Watch on YouTube

Abstract

This segment covers the introduction to the VAND 2.0 workshop, including its organizers, program committee, and submission statistics. It then features two talks: ‘SplatPose & Detect: Pose-Agnostic 3D Anomaly Detection’ by Mathis Kruse, which introduces a method for detecting anomalies in multi-pose 3D objects using Gaussian Splatting, and ‘Advancing Visual Anomaly Detection: A Comprehensive Open-Source Approach’ by Samet Akcay, which presents Anomaliib, an open-source library for anomaly detection. The segment concludes with the beginning of Shai Avidan’s talk, ‘Everything but Anomaly Detection,’ which delves into the conceptual understanding of anomalies and representation. This segment covers two presentations on anomaly detection. The first speaker introduces a method for coarse-grain anomaly detection using normalizing flows on pose graphs, emphasizing its efficiency and robustness, and proposes a ‘Novel Class Discovery’ approach for aerial imagery. The second speaker presents ‘DMR: Disentangling Marginal Representations for Out-of-Distribution Detection’, which tackles the problem of overconfidence in OOD detection through a pipeline involving encoding, marginal feature manipulation, and synthetic data generation, showcasing improved performance on benchmark datasets. This segment features two presentations. The first introduces the Text-Align Anomaly Backbone (TAB) model, a novel pre-training framework designed for industrial inspection tasks, leveraging text-image alignment to enhance anomaly detection and defect classification. The second presentation introduces BMAD, a new benchmark for medical anomaly detection, which includes six diverse datasets from five common medical domains and supports 15 state-of-the-art algorithms for standardized evaluation. This segment features two research talks on scalable real-time abnormal event detection in video, followed by an introduction to the VAND 2.0 Challenge at CVPR 2024. The first talk presents a knowledge distillation approach, leveraging object-centric models to train a fast frame-level anomaly detector. The second talk introduces a self-distilled masked auto-encoder architecture, enhanced with motion gradient weighting and synthetic anomalies to improve efficiency and accuracy. The segment concludes with an overview of the VAND 2.0 Challenge, detailing its categories for robust anomaly detection in real-world applications and few-shot learning for logical/structural detection, emphasizing the need for models adaptable to domain shifts. This segment features presentations from the winning teams of the VAND 2.0 Challenge at CVPR 2024. The first talk introduces ARNet, the 1st place solution for Category 1, which focuses on robust anomaly detection under real-world variations using a reconstruction-based network with a foreground predictor and synthetic data augmentation. The subsequent talks detail the top solutions for Track 2, the VLM Anomaly Challenge, with Ziyu Bao presenting the 2nd place approach utilizing segment-aligned features and Zhaopeng Gu presenting the 1st place AnomalyMoE, a Mixture of Experts framework for few-shot anomaly detection. The segment concludes with a wrap-up and information about related CVPR events.

Speakers

Paul Bergmann — MedUni Wien
Mathis Kruse — Leibniz University Hannover, Institute for Information Processing
Samet Akcay — AI Research Engineer & Scientist, Intel
Shai Avidan — School of Electrical Engineering, Tel-Aviv University
Dasol Choi — Yonsei University, MODULABS
Dongbin Na — Pohang University of Science and Technology
Ho-Weng Lee — National Tsing Hua University
Shang-Hong Lai — National Tsing Hua University
Jinan Bao — University of Alberta
Hanshi Sun — University of Alberta
Hanqiu Deng — University of Alberta
Yinsheng He — University of Alberta
Zhaoxiang Zhang — University of Alberta
Xingyu Li — University of Alberta
Radu Tudor Ionescu — University of Bucharest, Romania; SecurifAI, Romania
Florinel-Alin Croitoru — University of Bucharest, Romania
Nicolae-Cătălin Ristea — University of Bucharest, Romania
Fahad Shahbaz Khan — MBZ University of Artificial Intelligence, UAE
Mubarak Shah — University of Central Florida, US
Paula Ramos, PhD — AI Evangelist/CV Scientist, Intel
Dick Ameln, MSc — AI Research Engineer/Scientist, Intel
Ashwin Vaidya, MSc — AI Research Engineer/Scientist, Intel
Babar Hussain — TCL CORPORATE RESEARCH, HK
Ziyu Bao — Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences; University of Chinese Academy of Sciences; Objecteye Inc.
Zhaopeng Gu — Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences; University of Chinese Academy of Sciences; Objecteye Inc.
Dick Ameln — AI Research Engineer/Scientist, Utrecht, NL

Talks (15)

00:00:00 — Paul Bergmann: Introduction to VAND 2.0 Workshop
- Introduces the VAND 2.0 workshop, its organizers, program committee, submission statistics, schedule, and feedback mechanisms.
00:03:06 — Mathis Kruse: SplatPose & Detect: Pose-Agnostic 3D Anomaly Detection
- Presents SplatPose, a method for pose-agnostic 3D anomaly detection using 3D Gaussian Splatting to learn pose-invariant normality and detect anomalies in multi-pose settings.
00:15:59 — Samet Akcay: Advancing Visual Anomaly Detection: A Comprehensive Open-Source Approach
- Introduces Anomaliib, an open-source library designed to address challenges in visual anomaly detection by providing a comprehensive toolkit for designing, developing, and deploying deep learning anomaly detection algorithms, emphasizing reproducibility and ease of use.
01:08:38 — Shai Avidan: Everything but Anomaly Detection
- Discusses the fundamental aspects of anomaly detection, emphasizing the importance of representation and defining ‘anomaly with respect to what,’ and introduces a graph embedding approach for video anomaly detection.
01:19:34 — Dasol Choi: Coarse-Grain Anomaly Detection
- This talk introduces a method for coarse-grain anomaly detection using normalizing flows on pose graphs, highlighting its compact and real-time performance, and addresses the ‘Novel Class Discovery’ problem in aerial imagery by focusing on ‘anomaly existence’.
02:13:34 — Dasol Choi: DMR: Disentangling Marginal Representations for Out-of-Distribution Detection
- This presentation introduces DMR, a method for Out-of-Distribution (OOD) detection that addresses overconfidence by disentangling marginal representations using latent operations and multiple latent mixup, demonstrating superior performance on various datasets.
02:39:08 — Ho-Weng Lee: TAB: Text-Align Anomaly Backbone Model for Industrial Inspection Tasks
- This talk introduces the Text-Align Anomaly Backbone (TAB) model, a pre-training framework that leverages text-image alignment for improved industrial anomaly detection and defect classification.
03:21:15 — Jinan Bao: BMAD: Benchmarks for Medical Anomaly Detection
- This talk introduces BMAD, a comprehensive and standardized benchmark for medical anomaly detection, including six datasets from five medical domains and supporting 15 state-of-the-art algorithms.
03:58:42 — Radu Tudor Ionescu: Scalable Real-Time Abnormal Event Detection
- This talk introduces a method for scalable real-time abnormal event detection in video using knowledge distillation from object-centric models to a fast frame-level model, incorporating adversarial training.
04:07:00 — Radu Tudor Ionescu: Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors
- This talk presents a self-distilled masked auto-encoder architecture for efficient video anomaly detection, leveraging motion gradient weighting and synthetic anomalies for improved performance and scalability.
04:14:17 — Paula Ramos, PhD: VAND 2.0 Challenge at CVPR
- Introduction to the VAND 2.0 Challenge at CVPR, sponsored by Intel, outlining its categories and evaluation process for visual anomaly detection.
04:15:31 — Dick Ameln, MSc: The Challenge (Category 1 & 2 details)
- Detailed explanation of the VAND 2.0 Challenge categories, focusing on adapt & detect (robust anomaly detection) and VLM anomaly challenge (few-shot learning for logical and structural detection), including dataset creation and evaluation metrics.
05:18:31 — Babar Hussain: VAND 2.0: Challenge Category 1 - Adapt & Detect ARNet for Robust Anomaly Detection
- Presents ARNet, the winning solution for VAND 2.0 Challenge Category 1, focusing on robust anomaly detection under real-world variations using a reconstruction-based network with a foreground predictor and synthetic data augmentation.
05:33:46 — Ziyu Bao: Segment-aligned Features Impose Logical Constraints
- Presents the 2nd place solution for VAND 2.0 Challenge Track 2, focusing on few-shot anomaly detection using segment-aligned features, employing few-shot learning with pre-trained visual-language models to differentiate anomaly types.
05:38:46 — Zhaopeng Gu: AnomalyMoE: Few-shot Anomaly Detection Using Mixture of Experts
- Presents the 1st place solution for VAND 2.0 Challenge Track 2, AnomalyMoE, which uses a Mixture of Experts framework for few-shot anomaly detection, combining various strategies to detect logical and structural anomalies.

Key Takeaways

The VAND 2.0 workshop emphasizes advancements in visual anomaly and novelty detection, featuring diverse research and an open-source initiative.
SplatPose offers a novel approach to pose-agnostic 3D anomaly detection by leveraging 3D Gaussian Splatting for robust normality learning and efficient anomaly localization.
Anomaliib addresses the need for standardized, reproducible, and easily deployable anomaly detection solutions, providing a comprehensive open-source framework for researchers and practitioners.
Effective anomaly detection hinges on choosing the right data representation and understanding ‘anomaly with respect to what,’ moving beyond simple outlier detection in pixel space.
Normalizing Flows (STG-NF) applied to pose graphs offer a compact and real-time solution for video anomaly detection, demonstrating robustness across various scenarios.
The ‘Novel Class Discovery’ problem can be reframed as an ‘anomaly existence’ question, where the goal is to efficiently identify if any novel classes exist within a large dataset, rather than exhaustively classifying all anomalies.
The DMR (Disentangling Marginal Representations) method effectively addresses overconfidence in Out-of-Distribution (OOD) detection by synthesizing artificial OOD training data through latent operations and multiple latent mixup, leading to superior performance.
Large Language Models (LLMs) like ChatGPT show potential in anomaly detection by providing plausible anomaly scores and explanations for complex scenarios, suggesting a new avenue for research in integrating LLMs with traditional anomaly detection techniques.
The TAB model significantly improves anomaly detection and defect classification performance in industrial inspection tasks by using text-image alignment during pre-training.
The TAB model addresses domain gaps and global content biases inherent in ImageNet pre-trained models, making it more suitable for detecting subtle local anomalies.
BMAD provides a much-needed standardized benchmark for medical anomaly detection, offering diverse datasets and a robust evaluation framework for 15 state-of-the-art algorithms.
The BMAD benchmark highlights the current performance gaps in medical anomaly detection, especially for localization, and provides a platform for future research and development in this critical area.
Knowledge distillation from high-performing but slow object-centric models can create fast, frame-level anomaly detectors suitable for real-time applications.
Self-distillation and motion gradient weighting within masked auto-encoder architectures can significantly improve the efficiency and accuracy of video anomaly detection.
Synthetic anomalies and data augmentation techniques are crucial for training robust anomaly detection models, especially when labeled abnormal data is scarce.
Real-world anomaly detection challenges require models that are robust to domain shifts (e.g., lighting, camera position, motion blur) and can handle logical/structural defects with few-shot learning.
Robust anomaly detection requires models capable of adapting to unknown real-world variations, which can be achieved through synthetic data generation and foreground-aware training.
Few-shot anomaly detection benefits from leveraging pre-trained visual-language models and integrating semantic segmentation for segment-aligned feature extraction.
A Mixture of Experts approach, combining different anomaly detection strategies (VLM-based, part-segmentation-based, patch-level), can effectively address both logical and structural anomalies across various granularity levels.
The VAND 2.0 Challenge highlights the importance of developing models that are robust to domain shifts and capable of precise pixel-level anomaly localization.

Methods / Models / Datasets Mentioned

3D Gaussian Splatting
ACET
ARNet
AST
AUC
AUPR
AUROC
Anomaliib
Anomaly-Text-Aware pre-training strategy
AnomalyDINO
AnomalyMoE
Autoencoder (AE)
Avenue dataset
BMAD
BTAD
CADSD
CAVGA-R
CFA
CFLOW
CFlow-AD
CIFAR-10
CIFAR-100
CLIP
CS-Flow
ChatGPT
ComAD
Convolutional Transformer block
Coupled-Hypersphere-based Feature Adaptation
Cross-entropy loss
Cube-level models
CutPaste
DINO-ViT
DINOv2
DMR
DN2
DOTA-v2.0
DRAEM
DREAM
DTD
DeAOT
Deep SVDD
DeepSVDD
DenseNet-121
Discriminator
EfficientAD
Entropy
F1Max score
Frame-level models
GAN
GANomaly
GANs
GEOM
GOAD
Gaussian Noise
HR-STC
IDPA (Industrial Domain Prompt Association)
Image-level AUROC
ImageNet
KIRBY
KSDDD2
Kinetics-250
LSUN-crop
MAD dataset
MHRot
MIM
MKD
MSP
MVTec AD
MVtec AD
MVtec AD dataset
Mahalanobis
Masked Auto-Encoder (MAE)
Max pooling
MaxLogit
Mean Squared Error (MSE) reconstruction loss
MemSeg
MixedWM38
Motion gradient weighting
Multi-head attention
Multiple Latent Mixup (MLM)
NTU-RGB+D
OC-SVM
ODIN
Object-centric models
OmniAD
PNI Ensemble
PRO
PSAD
PaDiM
PaDim
PatchCore
Perlin Noise Generator
Pixel-level AUROC
Places-365
RBDC
RD4AAD
RD4AD
RealNet
RegAD
ResNet-50
ResNet18
ResNet50
SAM
SDAS
SPADE
ST-GCAE
ST-GCN
STC
STG-NF
STPM
SVHM
Self-Supervised Predictive Convolutional Attentive Block
Self-distillation
ShanghaiTech
ShanghaiTech dataset
SimpleNet
SplatPose
Student-Teacher Networks
Synthetic anomalies
TAB (Text-Align Anomaly Backbone)
TBDC
Textures
Tiny-ImageNet
UBNormal
UBnormal
UBnormal dataset
UCSD Ped2 dataset
UTRAD
VIM
VisA
WideResNet-40-2
WinCLIP
f-AnoGAN
iNeRF
t-SNE

Topics

3D Gaussian Splatting · Aerial Imagery · Anomaliib · Anomaly Detection · Anomaly Existence · Challenge Design · ChatGPT · Data Augmentation · Disentangling Marginal Representations · Domain Shift · Few-shot Learning · Graph Embedding · Knowledge Distillation · Logical Anomalies · Masked Autoencoders · Mixture of Experts · Multiple Latent Mixup · Normalizing Flows · Novel Class Discovery · Open-Source Tools · Out-of-Distribution Detection · Pose Graphs · Pose-Agnostic 3D Anomaly Detection · Real-Time Processing · Real-world Variations · Reproducibility · Robustness · Self-Supervised Learning · Semantic Segmentation · Structural Anomalies · Video Anomaly Detection · Workshop Introduction · anomaly detection · benchmark · deep learning · defect classification · industrial inspection · medical anomaly detection · pre-training framework · text-image alignment

Notes

Open for commentary — connections to other work, critiques, follow-up reading.