THE 8TH AI CITY CHALLENGE @ CVPR 2024

Event: CVPR 2024 · Duration: 508 min · ▶ Watch on YouTube

Abstract

This segment introduces the 8th AI City Challenge at CVPR 2024, detailing its tracks, participation statistics, and evaluation methodologies. It then transitions into a keynote speech on Large Scale Dynamic Scene Understanding, delivered by Professor Daniel Cremers. The keynote covers the historical development of 3D computer vision for autonomous systems, the challenges of real-world dynamic scenes, and novel numerical methods for bundle adjustment, concluding with applications in traffic acquisition and simulation. This segment features a series of paper presentations from the AICity Challenge 2024, focusing on Track 1 and Track 2. The presentations cover various approaches to multi-camera people tracking, including methods for handling occlusion, improving re-identification, and enhancing tracking accuracy in complex urban environments. Key topics include online tracking, spatial-temporal constraints, and the use of Vision Language Models for traffic safety analysis. The segment concludes with a brief Q&A and a call for the next set of presentations. This segment features a series of presentations on various methods for analyzing driving behavior and traffic safety using advanced computer vision and language models. Topics include efficient fine-tuning of Vision Language Models in urban settings, segment-based data processing for traffic safety analysis, parallel dense video captioning, multi-perspective traffic video description, multi-view action recognition for distracted driving, and spatial-temporal learning for unusual driving behaviors. The presentations highlight innovative approaches to data annotation, model architecture, and post-processing techniques to improve accuracy and efficiency in identifying and classifying driving activities. This segment features multiple presentations from teams participating in the AI City Challenge Track 4, focusing on road object detection in fisheye camera images. Each team presents their unique methodology, addressing challenges such as image distortion, low-light conditions, small object detection, and data inconsistency. The solutions leverage various techniques including advanced data augmentation, pseudo-labeling, image enhancement, super-resolution, and ensemble methods, demonstrating innovative approaches to improve detection accuracy and robustness in complex traffic surveillance scenarios. This segment features multiple presentations on advanced computer vision techniques for traffic analysis and safety. Topics include a coarse-to-fine two-stage helmet detection method for motorcyclists, an effective method for detecting helmet rule violations, and a knowledge-informed generative adversarial network for multi-vehicle trajectory forecasting at signalized intersections. Additionally, a simple in-place data augmentation technique for surveillance object detection is presented, highlighting innovative approaches to address challenges like data imbalance, complex traffic conditions, and occlusions in real-world scenarios. This segment features a dynamic panel discussion on the 8th AI City Challenge @ CVPR 2024, delving into the mission of leveraging AI for smart cities, the critical role of data augmentation, and the complex ethical landscape surrounding privacy and fairness in AI applications. Panelists explore strategies for anonymizing data, mitigating biases, and utilizing synthetic data and multimodal analysis to advance the field. The segment concludes with the highly anticipated announcement of award winners for all five tracks, recognizing top teams for their innovative solutions in areas such as multi-camera people tracking, traffic safety analysis, naturalistic driving action recognition, road object detection in fish-eye cameras, and helmet violation detection.

Speakers

Zheng Tang — NVIDIA
David C. Anastasiu — Santa Clara University
Quan Kong — Woven by Toyota
Munkhjargal Gochoo — The United Arab Emirates University
Pranamesh Chakraborty — Indian Institute of Technology Kanpur
Daniel Cremers — Chair of Computer Vision and AI, TU Munich; Munich Center for Machine Learning
Thomas Yang
Ryuto Yoshida — Yachiyo Engineering Co., Ltd.
Zhenyu Xie — Shanghai Jiao Tong University
Jeongho Kim — Nota Inc., Republic of Korea
Andreas Specker — Fraunhofer IOSB
Riu Cherdchusakitchai — AI and Robotics Ventures (ARV), Thailand
Zhihao Duan — Alibaba OpenTrek
Thomas Tang — Alibaba Cloud Intelligence Group
Viet Hung Duong — VNPT AI, Vietnam Posts and Telecommunications Group, Hanoi, Vietnam
Duc Quyen Nguyen — VNPT AI, Vietnam Posts and Telecommunications Group, Hanoi, Vietnam
Tien Cuong Nguyen — VNPT AI, Vietnam Posts and Telecommunications Group, Hanoi, Vietnam
Thien Van Luong — Faculty of Computer Science, Phenikaa University, Hanoi, Vietnam
Huan Vu — University of Transport and Communications, Hanoi, Vietnam
Wooksu Shin — Nota Inc., Republic of Korea
Donghyuk Choi — Nota Inc., Republic of Korea
Hancheol Park — Nota Inc., Republic of Korea
Long Hoang Pham — SKKU-AutoLab, Sungkyunkwan University
Quoc Pham-Nam Ho — SKKU-AutoLab, Sungkyunkwan University
Duong Nguyen-Ngoc Tran — SKKU-AutoLab, Sungkyunkwan University
Tai Huu-Phuong Tran — SKKU-AutoLab, Sungkyunkwan University
Huy-Hung Nguyen — SKKU-AutoLab, Sungkyunkwan University
Duong Khac Vu — SKKU-AutoLab, Sungkyunkwan University
Ngoc Doan-Minh Huynh — SKKU-AutoLab, Sungkyunkwan University
Hyung-Min Jeon — SKKU-AutoLab, Sungkyunkwan University
Hyung-Joon Jeon — SKKU-AutoLab, Sungkyunkwan University
Jae Wook Jeon — SKKU-AutoLab, Sungkyunkwan University
Bao Tran Gia — University of Information Technology, VNU-HCM, Vietnam
Tuong Bui Cong Khanh — University of Information Technology, VNU-HCM, Vietnam
Hien Ho Trong — University of Information Technology, VNU-HCM, Vietnam
Thuyen Tran Doan — University of Information Technology, VNU-HCM, Vietnam
Tien Do — University of Information Technology, VNU-HCM, Vietnam
Duy-Dinh Le — University of Information Technology, VNU-HCM, Vietnam
Thanh Duc Ngo — University of Information Technology, VNU-HCM, Vietnam
Dai Quoc Tran — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
Armstrong Aboah — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
Yuntae Jeon — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
Maged Shoman — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
Minsoo Park — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
Seunghee Park — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
Xingshuang Luo — Beijing University of Posts and Telecommunications
Zhe Cui — Beijing University of Posts and Telecommunications
Fei Su — Beijing University of Posts and Telecommunications
Pranamash
Hongpu Zhang — Beijing University of Posts and Telecommunications
Yunliang Chen — China Mobile Shanghai ICT Co., Ltd
Chen Wang — China Mobile Shanghai ICT Co., Ltd
Yingda Shang — China Mobile Shanghai ICT Co., Ltd
Chuheng Wei — University of California, Riverside
Guoyuan Wu — University of California, Riverside
Matthew Barth — University of California, Riverside
Amr Abdelraouf — Infotech Labs, Toyota
Rohit Gupta — Infotech Labs, Toyota
Kyungtae Han — Infotech Labs, Toyota
Munkh-Erdene Otgonbold — United Arab Emirates University
Ganzorig Batnasan — United Arab Emirates University
Norimasa Kobori — Yachiyo Engineering Co., Ltd.
Xunlei Wu — Department of Computer Science and Software Engineering, United Arab Emirates University, UAE

Talks (40)

00:00:00 — Zheng Tang: The 8th AI City Challenge @ CVPR 2024 - Introduction
- An introduction to the 8th AI City Challenge at CVPR 2024, covering the organizing committees, historical overview of the challenge, participation statistics, and timeline.
01:24:42 — Thomas Yang: Q&A Session with Daniel Cremers
- This segment begins with a Q&A session for Daniel Cremers, covering topics such as 3D bounding box detection, behavior modeling, and handling dynamic scenes in tracking.
01:35:54 — Ryuto Yoshida: Overlap Suppression Clustering for Offline Multi-Camera People Tracking
- Ryuto Yoshida presents a multi-camera people tracking method that achieved the highest HOTA score in the challenge, focusing on overlap suppression clustering and hierarchical clustering with average linkage.
01:46:41 — Zhenyu Xie: A Robust Online Multi-Camera People Tracking System With Geometric Consistency and State-aware Re-ID Correction
- Zhenyu Xie presents a robust online multi-camera people tracking system that addresses challenges like matching individuals across cameras, ID switching due to occlusion, and re-identification with varying poses and angles, using geometric consistency and state-aware re-ID correction.
01:56:11 — Jeongho Kim: Cluster Self-Refinement for Enhanced Online Multi-Camera People Tracking
- Jeongho Kim introduces a cluster self-refinement module for online multi-camera people tracking, addressing issues of inaccurate mapping, low-quality features, and multiple IDs assigned to a single person, leading to improved HOTA scores.
02:06:21 — Andreas Specker: OCMCTrack: Online Multi-Target Multi-Camera Tracking with Corrective Matching Cascade
- Andreas Specker presents OCMCTrack, an online multi-target multi-camera tracking system that focuses on correcting erroneous associations from previous time steps and improving bounding box to world projection, achieving competitive results in the challenge.
02:16:18 — Riu Cherdchusakitchai: Online Multi-camera People Tracking with Spatial-temporal Mechanism and Anchor-feature Hierarchical Clustering
- Riu Cherdchusakitchai presents an online multi-camera people tracking method that utilizes spatial-temporal constraints and anchor-feature hierarchical clustering to address challenges like occlusion, ID switching, and varying lighting conditions, achieving a HOTA score of 69.10%.
02:27:42 — Zhihao Duan: CityLLaVA: Efficient Fine-tuning for VLMs in City Scenario
- Zhihao Duan presents CityLLaVA, an efficient fine-tuning pipeline for Vision Language Models (VLMs) specifically designed for urban scenarios, addressing challenges like small targets, multi-views, background noise, and templated captions to improve traffic safety description and analysis.
02:49:24 — Thomas Tang: CityLLaVA: Efficient Fine-tuning for VLMs in City Scenario
- This talk introduces CityLLaVA, an efficient fine-tuning method for Vision Language Models in urban settings, highlighting visual prompt engineering, text QA construction, and block expansion techniques.
03:43:17 — Thomas Tang: Divide and Conquer Boosting for Enhanced Traffic Safety Description and Analysis with Large Vision Language Model
- This talk presents a method for traffic safety analysis using a large vision language model, focusing on segment-based data processing and a two-stage training pipeline to generate detailed descriptions of traffic scenarios.
04:14:18 — Viet Hung Duong: ROBUST DATA AUGMENTATION AND ENSEMBLE METHOD FOR OBJECT DETECTION IN FISHEYE IMAGES
- Presents a solution for fisheye image object detection using robust data augmentation, K-fold splitting, synthetic data generation, model selection, pseudo-labeling, and ensembling, achieving 1st place in the AI City Challenge Track 4.
04:14:47 — Wooksu Shin: Road Object Detection Robust to Distorted Objects at the Edge Regions of Images
- Addresses challenges in fisheye image object detection, particularly with distorted objects at image edges, using task-specific methods like slicing aided hyper inference and semi-supervised learning, general methods, and ensemble techniques.
04:15:05 — Long Hoang Pham: Improving Object Detection to Fisheye Cameras with Open-Vocabulary Pseudo-Label Approach
- Proposes an open-vocabulary pseudo-label approach for improving object detection in fisheye cameras, addressing data inconsistency and distortion challenges using YOLO-World and CycleGAN.
04:15:20 — Bao Tran Gia: Enhancing Road Object Detection in Fisheye Cameras: An Effective Framework Integrating SAHI and Hybrid Inference
- Presents an effective framework for enhancing road object detection in fisheye cameras by integrating SAHI and hybrid inference, utilizing data preparation, augmentation, various models, and post-processing strategies.
04:15:31 — Dai Quoc Tran: Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets
- Introduces a low-light image enhancement framework for improved object detection in fisheye lens datasets, focusing on addressing challenges posed by low-light conditions and small object detection.
04:15:43 — Xingshuang Luo: FE-Det: An Effective Traffic Object Detection Framework for Fish-Eye Cameras
- Proposes FE-Det, an effective traffic object detection framework for fisheye cameras, addressing challenges like distortion, tiny objects, and similar classes through a comprehensive methodology including data augmentation, model architecture, and post-processing.
05:18:14 — Thomas Tang: Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis
- This talk introduces a parallel dense video captioning method integrated with CLIP visual features for enhanced accuracy and efficiency in traffic video analysis, focusing on fine-grained captions and end-to-end event analysis.
05:38:48 — Thomas Tang: None
- The speaker introduces the next session of presentations.
05:38:57 — Hongpu Zhang: A Coarse-to-fine Two-stage Helmet Detection Method for Motorcyclists
- This presentation introduces a coarse-to-fine two-stage helmet detection method for motorcyclists, addressing challenges like varying object sizes and complex traffic conditions in real-world scenarios.
05:41:18 — Yunliang Chen: An Effective Method for Detecting Violation of Helmet Rule for Motorcyclists
- This presentation introduces an effective method for detecting helmet rule violations in motorcyclists, utilizing a two-stage object detection framework with data augmentation and ensemble techniques to address challenges like data imbalance and occlusion.
05:42:49 — Chuheng Wei: KI-GAN: Knowledge-Informed Generative Adversarial Networks for Enhanced Multi-Vehicle Trajectory Forecasting at Signalized Intersections
- This presentation introduces KI-GAN, a knowledge-informed generative adversarial network for multi-vehicle trajectory forecasting at signalized intersections, addressing research gaps in traffic light influence, data integration, and interaction pooling.
05:44:55 — Munkh-Erdene Otgonbold: Simple In-place Data Augmentation for Surveillance Object Detection
- This presentation introduces a simple in-place data augmentation technique for surveillance object detection, aiming to improve road object detection performance by balancing class samples and preserving realistic image appearance.
05:48:14 — Thomas Tang: Multi-perspective Traffic Video Description Model with Fine-grained Refinement Approach
- This talk presents a multi-perspective traffic video description model that utilizes fine-grained refinement for generating detailed descriptions of traffic scenarios, focusing on textual and visual attribute extraction, video captioning, and a refinement module.
06:18:52 — Thomas Tang: Multi-View Action Recognition for Distracted Driver Behavior Localization
- This talk introduces a multi-view action recognition framework for localizing distracted driver behaviors, utilizing a two-stage approach with action recognition networks and a temporal localization module to identify and classify driving activities.
06:33:14 — Thomas Tang: Multi-View Spatial-Temporal Learning for Understanding Unusual Behaviors in Untrimmed Naturalistic Driving Videos
- This talk presents a multi-view spatial-temporal learning approach for understanding unusual behaviors in untrimmed naturalistic driving videos, focusing on a two-stage framework with action recognition models and a multi-step post-processing algorithm to detect and classify driving activities.
06:41:14 — Thomas Tang: DeepLocalization: Using change point detection for Temporal Action Localization
- This talk introduces DeepLocalization, a real-time localization framework for driver actions using deep learning techniques and graph-based change-point detection to accurately determine driver actions’ start and end times.
07:08:30 — Thomas Tang: Panel Discussion: AI City Challenge Mission, Data Augmentation, Privacy, Fairness, and Future Directions
- Panelists discuss the AI City Challenge’s mission to improve smart cities, the role of data augmentation, ethical concerns like privacy and bias, and future directions including multimodal analysis and control systems.
08:00:00 — Zheng Tang: Track 1: Multi-Camera People Tracking
- Presentation of Track 1, focusing on multi-camera people tracking, detailing dataset statistics, evaluation metrics, and common/leading approaches.
10:19:00 — Quan Kong: Track 2: Traffic Safety Description and Analysis
- Overview of Track 2, which involves traffic safety description and analysis using a large-scale pedestrian-centric video dataset with multi-view and fine-grained captions.
11:18:30 — Thomas Tang: Track 1 Award Announcement: Multi-Camera People Tracking
- Team 79 (Shanghai Jiao Tong University & Lenovo) wins for ‘A Robust Online Multi-Camera People Tracking System With Geometric Consistency and State-aware Re-ID Correction’, and Team 221 (Yachiyo Engineering Co., Ltd., Research Institute for Infra. Paradigm Shift & Chubu University) is runner-up for ‘Overlap Suppression Clustering for Offline Multi-Camera People Tracking’.
11:43:30 — Thomas Tang: Track 2 Award Announcement: Traffic Safety Description and Analysis
- Team 208 (Alibaba) wins for ‘CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario’, and Team 28 (VNU HCM, FPT Telecom & VGU) is runner-up for ‘Divide and Conquer Boosting for Enhanced Traffic Safety Description and Analysis with Large Vision Language Model’.
12:12:00 — Zheng Tang: Track 3: Naturalistic Driving Action Recognition
- Details on Track 3, focusing on naturalistic driving action recognition using a dataset of distracted driver activities captured from multiple synchronized camera views.
12:12:30 — Thomas Tang: Track 3 Award Announcement: Naturalistic Driving Action Recognition
- Team 155 (TeleAI) wins, Team 5 (SKKU-AutoLab) is runner-up, and Team 165 (MCPRL) receives an honorable mention. Final ranking is based on Dataset B due to ongoing code estimation.
12:41:30 — Thomas Tang: Track 4 Award Announcement: Road Object Detection in Fish-Eye Cameras
- Team 9 (Vietnam Posts and Telecommunications Group (VNPT) & Phenikaa University) wins for ‘Robust Data Augmentation and Ensemble Method for Object Detection in Fisheye Camera Images’, Team 40 (Nota.ai) is runner-up for ‘Road Object Detection Robust to Distorted Objects at the Edge Regions of Images’, and Team 5 (Sungkyunkwan University) receives an honorable mention for ‘Improving Object Detection to Fisheye Cameras with Open-Vocabulary Pseudo-Label Approach’.
13:16:30 — Thomas Tang: Track 5 Award Announcement: Detecting Violation of Helmet Rule for Motorcyclists
- Team 99 (University of Information Technology, VNU-HCM) co-wins for ‘Robust Motorcycle Helmet Detection in Real-World Scenarios: Using Co-DETR and Minority Class Enhancement’, and Team 76 (China Mobile Shanghai ICT Co.,Ltd) co-wins for ‘An Effective Method for Detecting Violation of Helmet Rule for Motorcyclists’.
13:38:30 — Thomas Tang: AICITY CHALLENGE 2024 WINNERS Summary
- A summary table of all winners, runner-ups, and honorable mentions across all tracks, along with their respective NVIDIA RTX 4080 SUPER or NVIDIA Jetson Orin Nano dev kit awards.
13:54:00 — Munkhjargal Gochoo: Track 4: Road Object Detection in Fish-Eye Cameras
- Presentation of Track 4, covering road object detection in fish-eye cameras, detailing the dataset, object classes, and common techniques used by participants.
15:39:00 — Pranamesh Chakraborty: Track 5: Detecting Violation of Helmet Rule for Motorcyclists
- Overview of Track 5, which addresses detecting helmet rule violations for motorcyclists using a dataset from an Indian city under various conditions, and discusses common and leading approaches.
17:54:00 — David C. Anastasiu: AI City Challenge Evaluation Methodology and Statistics
- Explanation of the evaluation methodology for each track, followed by detailed statistics on team participation, submission frequency, score improvements, and leaderboard dynamics.
34:05:00 — Daniel Cremers: Large Scale Dynamic Scene Understanding
- Keynote speech introducing the challenges of large-scale dynamic scene understanding for autonomous systems, covering historical context, current progress in automated driving, and novel methods for 3D reconstruction and traffic acquisition.

Key Takeaways

The AI City Challenge has seen significant growth in participation, with a focus on real-world computer vision problems for smart cities and autonomous driving.
Understanding and modeling complex, dynamic human behavior in diverse traffic scenarios is crucial for the safe and effective deployment of autonomous vehicles.
Novel numerical methods like Power Bundle Adjustment offer significant improvements in speed, accuracy, and memory efficiency for large-scale 3D reconstruction problems, which are foundational for dynamic scene understanding.
Real-world data acquisition from surveillance and aerial platforms, combined with advanced 3D reconstruction and tracking, enables the creation of high-fidelity simulations and the development of robust autonomous driving systems.
Online multi-camera people tracking is a complex task with significant challenges including occlusion, ID switching, and re-identification across varying poses and camera angles.
Effective solutions often involve multi-stage pipelines combining object detection, single-camera tracking, inter-camera association, and advanced re-identification techniques.
Leveraging spatial-temporal constraints, hierarchical clustering, and geometric consistency can significantly improve tracking accuracy and robustness in dense and dynamic environments.
The AICity Challenge encourages the development of online tracking methods, with incentives for real-time applicability, and highlights the need for robust evaluation metrics for extremely long video sequences.
Visual prompt engineering and block expansion techniques are crucial for efficient fine-tuning of Vision Language Models in urban settings, leading to superior performance compared to traditional methods.
Segment-based data processing and a two-stage training pipeline, combined with dynamic combiners, can effectively generate precise and detailed descriptions of traffic safety scenarios.
Parallel dense video captioning, integrated with CLIP visual features, significantly enhances accuracy and efficiency in generating fine-grained captions for end-to-end event analysis in traffic videos.
Multi-view action recognition frameworks, utilizing advanced action recognition networks and multi-step post-processing algorithms, are effective in localizing and classifying distracted driver behaviors in untrimmed naturalistic driving videos.
Various advanced techniques like robust data augmentation, pseudo-labeling, and ensemble methods are crucial for achieving high accuracy in fisheye image object detection.
Addressing specific challenges such as distorted objects at image edges, low-light conditions, and small object sizes requires specialized approaches like SAHI, image enhancement frameworks, and super-resolution.
Combining multiple models and diverse data processing strategies through ensemble techniques consistently leads to improved overall performance and robustness in complex real-world scenarios.
A coarse-to-fine two-stage helmet detection method effectively addresses varying object sizes and complex traffic conditions for improved motorcyclist safety.
Data augmentation techniques, including manual and automatic copy-paste augmentation, are crucial for addressing data imbalance and improving model performance in helmet violation detection.
KI-GAN, a knowledge-informed GAN, integrates diverse data sources and specialized pooling methods to achieve superior accuracy in multi-vehicle trajectory forecasting at signalized intersections.
Simple in-place data augmentation can significantly enhance object detection performance in surveillance datasets by balancing class samples while maintaining realistic image appearance.
The AI City Challenge @ CVPR 2024 emphasizes the development of AI solutions for smart cities, addressing real-world problems in transportation, safety, and urban management.
Ethical considerations, particularly data privacy and algorithmic fairness, are paramount, with discussions highlighting the use of anonymization techniques, synthetic data, and careful data collection to mitigate risks.
The challenge encourages diverse approaches, including multimodal sensor fusion, human-in-the-loop systems, and the application of advanced models like Large Vision Language Models, to push the boundaries of AI capabilities.
Winners across five tracks were awarded NVIDIA RTX 4080 SUPER GPUs and NVIDIA Jetson Orin Nano dev kits, recognizing innovative solutions in multi-camera tracking, traffic safety, driving action recognition, fish-eye object detection, and helmet violation detection.

Methods / Models / Datasets Mentioned

4D object tracking
Action probability calibration
Activity Overlap Score
Adaptive thresholding
Agglomerative Clustering
Anchor-feature Hierarchical Clustering
Average Linkage
BLEU-4
Block Expansion
Bot-SORT
ByteTrack
CIDEr
CLIP
CLIP Image Encoder
CO-DETR
CO-DINO model
Ceres-explicit
Ceres-implicit
CityLLaVA
Clip-level video recognition
Clustering
Co-DETR
CoDETR
Code Estimation
Corrective Matching Cascade
CycleGAN
DETA
DINO
DINO-4scale
DINO-5scale
DINQ
Data augmentation
Decoder Block
Deep Scenario
DeepLocalization
DeepSORT
Diffusion model
Distance-Aware Loss (DAL)
Divide and Conquer Boosting for Enhanced Traffic Safety Description and Analysis
Dual Aggregation Transformer (DAT)
Efficient Fine-Tuning for VLMs
EfficientDet
Embedding Model
Ensemble
Ensemble Models
F1 Score
F1-score
FJMP
Few-Shot Prompting
Fish-Eye Cameras
Flipping
GSAD
Gauss-Newton
Geometric Consistency
Graph Neural Networks (GNNs)
Graph-Based Change-Point Detection
H-DINO
HOTA
HOTA Score
Hierarchical Clustering
Hierarchical clustering
Histogram equalization
IDF1
Improving Object Detection to Fisheye Cameras with Open-Vocabulary Pseudo-Label Approach
Intel SceneScape
InternImage
K-fold split
KI-GAN
LLM Segment Extractor
LLaMA Pro
LLaMA2
LLaVA-34B
Large Language Model
Large Vision Language Model
Lee et al.'s Method
Levenberg-Marquardt
LoRA
METEOR
MHSA
MLP Projector
MOTA
Mask2Former
Mean Average Precision
Merge and Remove
Minority Optimizer Algorithm
Mosaic
Multi-scale training
Multi-view fusion
NAFNet
NVIDIA Jetson Orin Nano dev kit
NVIDIA RTX 4080 SUPER
Naturalistic Driving Action Recognition
Nearest Neighbor Mapping
OCMCTrack
Object detection
Offline Tracking
Online Tracking
Online multi-camera tracking
Open-Vocabulary Pseudo-Label Approach
OpenVINO
Overlap Suppression Clustering
PDVC (Parallel Dense Video Captioning)
Photogrammetry
Pose estimation
Post Connection
Power Bundle Adjustment (PoBA)
Procrustes Analysis
Pseudo-labeling
QR decomposition
Qwen-VL
RMSNorm
ROUGE-L
RTMPose
ReID
ResNet50-IBN
Road Object Detection Robust to Distorted Objects at the Edge Regions of Images
Robust Data Augmentation
Robust Motorcycle Helmet Detection
Rule-based Sentence Extractor
S-GAN
S-LSTM
SAHI
Schur complement
Segment Extraction Pipeline
Semi-supervised learning
Single-camera tracking
Slicing Aided Hyper Inference (SAHI)
Spatial-Temporal Mechanism
Square Root Bundle Adjustment (PoBA)
StableSR
State-aware Re-ID Correction
Super-resolution
SwiGLU
Swin-B
Swin-L
Synthetic data generation
TensorRT
Test-time augmentation
Text QA Construction
Threshold Filter
Trajetron++
UniformerV2
VAP-Net
VLM
Video Swin Transformer
Video-LLM
VideoLLaVA
VideoMAE
Vision Language Model (VLM)
Vision Language Models (VLMs)
Visual Prompt Engineering
Visual-text prompt engineering
Weighted Box Fusion
Weighted Box Fusion (WBF)
X3D
YOLO
YOLO-R
YOLO-V7
YOLO-W6
YOLO-World
YOLO-based Detector
YOLOv5
YOLOv6-L6
YOLOv7
YOLOv8
YOLOv8x
YOLOv9
YOLOv9-e
mAP

Topics

3D bounding box detection · 3D reconstruction · AI City Challenge · AI City Challenge Track 4 · Action Recognition · Autonomous driving · Behavior modeling · Bias · Bundle adjustment · Computer Vision · Data Annotation · Data Augmentation · Data augmentation · Deep Learning · Distorted objects · Driving Behavior Analysis · Driving action recognition · Dynamic scenes · Ensemble methods · Ethical AI · Fairness · Fisheye image object detection · Geometric consistency · Helmet Detection · Helmet violation detection · Hierarchical clustering · Human-in-the-Loop · Hybrid inference · Large-scale dynamic scene understanding · Low-light image enhancement · Multi-camera people tracking · Multimodal Analysis · Object Detection · Object detection · Online tracking · Overlap suppression clustering · Parking violation detection · Privacy · Pseudo-labeling · Road Safety · Small object detection · Smart Cities · State-aware re-ID correction · Super-resolution · Synthetic Data · Temporal Localization · Traffic Management · Traffic Safety · Traffic safety analysis · Traffic simulation · Traffic surveillance · Urban Scenarios · Video Captioning · Vision Language Models · class imbalance · data augmentation · generative adversarial networks · helmet detection · motorcyclist safety · object detection · traffic analysis · trajectory forecasting

Notes

Open for commentary — connections to other work, critiques, follow-up reading.