THE 8TH AI CITY CHALLENGE @ CVPR 2024

Event: CVPR 2024 · Duration: 508 min · ▶ Watch on YouTube

Abstract

This segment introduces the 8th AI City Challenge at CVPR 2024, detailing its tracks, participation statistics, and evaluation methodologies. It then transitions into a keynote speech on Large Scale Dynamic Scene Understanding, delivered by Professor Daniel Cremers. The keynote covers the historical development of 3D computer vision for autonomous systems, the challenges of real-world dynamic scenes, and novel numerical methods for bundle adjustment, concluding with applications in traffic acquisition and simulation. This segment features a series of paper presentations from the AICity Challenge 2024, focusing on Track 1 and Track 2. The presentations cover various approaches to multi-camera people tracking, including methods for handling occlusion, improving re-identification, and enhancing tracking accuracy in complex urban environments. Key topics include online tracking, spatial-temporal constraints, and the use of Vision Language Models for traffic safety analysis. The segment concludes with a brief Q&A and a call for the next set of presentations. This segment features a series of presentations on various methods for analyzing driving behavior and traffic safety using advanced computer vision and language models. Topics include efficient fine-tuning of Vision Language Models in urban settings, segment-based data processing for traffic safety analysis, parallel dense video captioning, multi-perspective traffic video description, multi-view action recognition for distracted driving, and spatial-temporal learning for unusual driving behaviors. The presentations highlight innovative approaches to data annotation, model architecture, and post-processing techniques to improve accuracy and efficiency in identifying and classifying driving activities. This segment features multiple presentations from teams participating in the AI City Challenge Track 4, focusing on road object detection in fisheye camera images. Each team presents their unique methodology, addressing challenges such as image distortion, low-light conditions, small object detection, and data inconsistency. The solutions leverage various techniques including advanced data augmentation, pseudo-labeling, image enhancement, super-resolution, and ensemble methods, demonstrating innovative approaches to improve detection accuracy and robustness in complex traffic surveillance scenarios. This segment features multiple presentations on advanced computer vision techniques for traffic analysis and safety. Topics include a coarse-to-fine two-stage helmet detection method for motorcyclists, an effective method for detecting helmet rule violations, and a knowledge-informed generative adversarial network for multi-vehicle trajectory forecasting at signalized intersections. Additionally, a simple in-place data augmentation technique for surveillance object detection is presented, highlighting innovative approaches to address challenges like data imbalance, complex traffic conditions, and occlusions in real-world scenarios. This segment features a dynamic panel discussion on the 8th AI City Challenge @ CVPR 2024, delving into the mission of leveraging AI for smart cities, the critical role of data augmentation, and the complex ethical landscape surrounding privacy and fairness in AI applications. Panelists explore strategies for anonymizing data, mitigating biases, and utilizing synthetic data and multimodal analysis to advance the field. The segment concludes with the highly anticipated announcement of award winners for all five tracks, recognizing top teams for their innovative solutions in areas such as multi-camera people tracking, traffic safety analysis, naturalistic driving action recognition, road object detection in fish-eye cameras, and helmet violation detection.

Speakers

  • Zheng Tang — NVIDIA
  • David C. Anastasiu — Santa Clara University
  • Quan Kong — Woven by Toyota
  • Munkhjargal Gochoo — The United Arab Emirates University
  • Pranamesh Chakraborty — Indian Institute of Technology Kanpur
  • Daniel Cremers — Chair of Computer Vision and AI, TU Munich; Munich Center for Machine Learning
  • Thomas Yang
  • Ryuto Yoshida — Yachiyo Engineering Co., Ltd.
  • Zhenyu Xie — Shanghai Jiao Tong University
  • Jeongho Kim — Nota Inc., Republic of Korea
  • Andreas Specker — Fraunhofer IOSB
  • Riu Cherdchusakitchai — AI and Robotics Ventures (ARV), Thailand
  • Zhihao Duan — Alibaba OpenTrek
  • Thomas Tang — Alibaba Cloud Intelligence Group
  • Viet Hung Duong — VNPT AI, Vietnam Posts and Telecommunications Group, Hanoi, Vietnam
  • Duc Quyen Nguyen — VNPT AI, Vietnam Posts and Telecommunications Group, Hanoi, Vietnam
  • Tien Cuong Nguyen — VNPT AI, Vietnam Posts and Telecommunications Group, Hanoi, Vietnam
  • Thien Van Luong — Faculty of Computer Science, Phenikaa University, Hanoi, Vietnam
  • Huan Vu — University of Transport and Communications, Hanoi, Vietnam
  • Wooksu Shin — Nota Inc., Republic of Korea
  • Donghyuk Choi — Nota Inc., Republic of Korea
  • Hancheol Park — Nota Inc., Republic of Korea
  • Long Hoang Pham — SKKU-AutoLab, Sungkyunkwan University
  • Quoc Pham-Nam Ho — SKKU-AutoLab, Sungkyunkwan University
  • Duong Nguyen-Ngoc Tran — SKKU-AutoLab, Sungkyunkwan University
  • Tai Huu-Phuong Tran — SKKU-AutoLab, Sungkyunkwan University
  • Huy-Hung Nguyen — SKKU-AutoLab, Sungkyunkwan University
  • Duong Khac Vu — SKKU-AutoLab, Sungkyunkwan University
  • Ngoc Doan-Minh Huynh — SKKU-AutoLab, Sungkyunkwan University
  • Hyung-Min Jeon — SKKU-AutoLab, Sungkyunkwan University
  • Hyung-Joon Jeon — SKKU-AutoLab, Sungkyunkwan University
  • Jae Wook Jeon — SKKU-AutoLab, Sungkyunkwan University
  • Bao Tran Gia — University of Information Technology, VNU-HCM, Vietnam
  • Tuong Bui Cong Khanh — University of Information Technology, VNU-HCM, Vietnam
  • Hien Ho Trong — University of Information Technology, VNU-HCM, Vietnam
  • Thuyen Tran Doan — University of Information Technology, VNU-HCM, Vietnam
  • Tien Do — University of Information Technology, VNU-HCM, Vietnam
  • Duy-Dinh Le — University of Information Technology, VNU-HCM, Vietnam
  • Thanh Duc Ngo — University of Information Technology, VNU-HCM, Vietnam
  • Dai Quoc Tran — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
  • Armstrong Aboah — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
  • Yuntae Jeon — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
  • Maged Shoman — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
  • Minsoo Park — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
  • Seunghee Park — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
  • Xingshuang Luo — Beijing University of Posts and Telecommunications
  • Zhe Cui — Beijing University of Posts and Telecommunications
  • Fei Su — Beijing University of Posts and Telecommunications
  • Pranamash
  • Hongpu Zhang — Beijing University of Posts and Telecommunications
  • Yunliang Chen — China Mobile Shanghai ICT Co., Ltd
  • Chen Wang — China Mobile Shanghai ICT Co., Ltd
  • Yingda Shang — China Mobile Shanghai ICT Co., Ltd
  • Chuheng Wei — University of California, Riverside
  • Guoyuan Wu — University of California, Riverside
  • Matthew Barth — University of California, Riverside
  • Amr Abdelraouf — Infotech Labs, Toyota
  • Rohit Gupta — Infotech Labs, Toyota
  • Kyungtae Han — Infotech Labs, Toyota
  • Munkh-Erdene Otgonbold — United Arab Emirates University
  • Ganzorig Batnasan — United Arab Emirates University
  • Norimasa Kobori — Yachiyo Engineering Co., Ltd.
  • Xunlei Wu — Department of Computer Science and Software Engineering, United Arab Emirates University, UAE

Talks (40)

  • 00:00:00 — Zheng Tang: The 8th AI City Challenge @ CVPR 2024 - Introduction
    • An introduction to the 8th AI City Challenge at CVPR 2024, covering the organizing committees, historical overview of the challenge, participation statistics, and timeline.
  • 01:24:42Thomas Yang: Q&A Session with Daniel Cremers
    • This segment begins with a Q&A session for Daniel Cremers, covering topics such as 3D bounding box detection, behavior modeling, and handling dynamic scenes in tracking.
  • 01:35:54Ryuto Yoshida: Overlap Suppression Clustering for Offline Multi-Camera People Tracking
    • Ryuto Yoshida presents a multi-camera people tracking method that achieved the highest HOTA score in the challenge, focusing on overlap suppression clustering and hierarchical clustering with average linkage.
  • 01:46:41Zhenyu Xie: A Robust Online Multi-Camera People Tracking System With Geometric Consistency and State-aware Re-ID Correction
    • Zhenyu Xie presents a robust online multi-camera people tracking system that addresses challenges like matching individuals across cameras, ID switching due to occlusion, and re-identification with varying poses and angles, using geometric consistency and state-aware re-ID correction.
  • 01:56:11Jeongho Kim: Cluster Self-Refinement for Enhanced Online Multi-Camera People Tracking
    • Jeongho Kim introduces a cluster self-refinement module for online multi-camera people tracking, addressing issues of inaccurate mapping, low-quality features, and multiple IDs assigned to a single person, leading to improved HOTA scores.
  • 02:06:21Andreas Specker: OCMCTrack: Online Multi-Target Multi-Camera Tracking with Corrective Matching Cascade
    • Andreas Specker presents OCMCTrack, an online multi-target multi-camera tracking system that focuses on correcting erroneous associations from previous time steps and improving bounding box to world projection, achieving competitive results in the challenge.
  • 02:16:18Riu Cherdchusakitchai: Online Multi-camera People Tracking with Spatial-temporal Mechanism and Anchor-feature Hierarchical Clustering
    • Riu Cherdchusakitchai presents an online multi-camera people tracking method that utilizes spatial-temporal constraints and anchor-feature hierarchical clustering to address challenges like occlusion, ID switching, and varying lighting conditions, achieving a HOTA score of 69.10%.
  • 02:27:42Zhihao Duan: CityLLaVA: Efficient Fine-tuning for VLMs in City Scenario
    • Zhihao Duan presents CityLLaVA, an efficient fine-tuning pipeline for Vision Language Models (VLMs) specifically designed for urban scenarios, addressing challenges like small targets, multi-views, background noise, and templated captions to improve traffic safety description and analysis.
  • 02:49:24Thomas Tang: CityLLaVA: Efficient Fine-tuning for VLMs in City Scenario
    • This talk introduces CityLLaVA, an efficient fine-tuning method for Vision Language Models in urban settings, highlighting visual prompt engineering, text QA construction, and block expansion techniques.
  • 03:43:17Thomas Tang: Divide and Conquer Boosting for Enhanced Traffic Safety Description and Analysis with Large Vision Language Model
    • This talk presents a method for traffic safety analysis using a large vision language model, focusing on segment-based data processing and a two-stage training pipeline to generate detailed descriptions of traffic scenarios.
  • 04:14:18Viet Hung Duong: ROBUST DATA AUGMENTATION AND ENSEMBLE METHOD FOR OBJECT DETECTION IN FISHEYE IMAGES
    • Presents a solution for fisheye image object detection using robust data augmentation, K-fold splitting, synthetic data generation, model selection, pseudo-labeling, and ensembling, achieving 1st place in the AI City Challenge Track 4.
  • 04:14:47Wooksu Shin: Road Object Detection Robust to Distorted Objects at the Edge Regions of Images
    • Addresses challenges in fisheye image object detection, particularly with distorted objects at image edges, using task-specific methods like slicing aided hyper inference and semi-supervised learning, general methods, and ensemble techniques.
  • 04:15:05Long Hoang Pham: Improving Object Detection to Fisheye Cameras with Open-Vocabulary Pseudo-Label Approach
    • Proposes an open-vocabulary pseudo-label approach for improving object detection in fisheye cameras, addressing data inconsistency and distortion challenges using YOLO-World and CycleGAN.
  • 04:15:20Bao Tran Gia: Enhancing Road Object Detection in Fisheye Cameras: An Effective Framework Integrating SAHI and Hybrid Inference
    • Presents an effective framework for enhancing road object detection in fisheye cameras by integrating SAHI and hybrid inference, utilizing data preparation, augmentation, various models, and post-processing strategies.
  • 04:15:31Dai Quoc Tran: Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets
    • Introduces a low-light image enhancement framework for improved object detection in fisheye lens datasets, focusing on addressing challenges posed by low-light conditions and small object detection.
  • 04:15:43Xingshuang Luo: FE-Det: An Effective Traffic Object Detection Framework for Fish-Eye Cameras
    • Proposes FE-Det, an effective traffic object detection framework for fisheye cameras, addressing challenges like distortion, tiny objects, and similar classes through a comprehensive methodology including data augmentation, model architecture, and post-processing.
  • 05:18:14Thomas Tang: Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis
    • This talk introduces a parallel dense video captioning method integrated with CLIP visual features for enhanced accuracy and efficiency in traffic video analysis, focusing on fine-grained captions and end-to-end event analysis.
  • 05:38:48Thomas Tang: None
    • The speaker introduces the next session of presentations.
  • 05:38:57Hongpu Zhang: A Coarse-to-fine Two-stage Helmet Detection Method for Motorcyclists
    • This presentation introduces a coarse-to-fine two-stage helmet detection method for motorcyclists, addressing challenges like varying object sizes and complex traffic conditions in real-world scenarios.
  • 05:41:18Yunliang Chen: An Effective Method for Detecting Violation of Helmet Rule for Motorcyclists
    • This presentation introduces an effective method for detecting helmet rule violations in motorcyclists, utilizing a two-stage object detection framework with data augmentation and ensemble techniques to address challenges like data imbalance and occlusion.
  • 05:42:49Chuheng Wei: KI-GAN: Knowledge-Informed Generative Adversarial Networks for Enhanced Multi-Vehicle Trajectory Forecasting at Signalized Intersections
    • This presentation introduces KI-GAN, a knowledge-informed generative adversarial network for multi-vehicle trajectory forecasting at signalized intersections, addressing research gaps in traffic light influence, data integration, and interaction pooling.
  • 05:44:55Munkh-Erdene Otgonbold: Simple In-place Data Augmentation for Surveillance Object Detection
    • This presentation introduces a simple in-place data augmentation technique for surveillance object detection, aiming to improve road object detection performance by balancing class samples and preserving realistic image appearance.
  • 05:48:14Thomas Tang: Multi-perspective Traffic Video Description Model with Fine-grained Refinement Approach
    • This talk presents a multi-perspective traffic video description model that utilizes fine-grained refinement for generating detailed descriptions of traffic scenarios, focusing on textual and visual attribute extraction, video captioning, and a refinement module.
  • 06:18:52Thomas Tang: Multi-View Action Recognition for Distracted Driver Behavior Localization
    • This talk introduces a multi-view action recognition framework for localizing distracted driver behaviors, utilizing a two-stage approach with action recognition networks and a temporal localization module to identify and classify driving activities.
  • 06:33:14Thomas Tang: Multi-View Spatial-Temporal Learning for Understanding Unusual Behaviors in Untrimmed Naturalistic Driving Videos
    • This talk presents a multi-view spatial-temporal learning approach for understanding unusual behaviors in untrimmed naturalistic driving videos, focusing on a two-stage framework with action recognition models and a multi-step post-processing algorithm to detect and classify driving activities.
  • 06:41:14Thomas Tang: DeepLocalization: Using change point detection for Temporal Action Localization
    • This talk introduces DeepLocalization, a real-time localization framework for driver actions using deep learning techniques and graph-based change-point detection to accurately determine driver actions’ start and end times.
  • 07:08:30Thomas Tang: Panel Discussion: AI City Challenge Mission, Data Augmentation, Privacy, Fairness, and Future Directions
    • Panelists discuss the AI City Challenge’s mission to improve smart cities, the role of data augmentation, ethical concerns like privacy and bias, and future directions including multimodal analysis and control systems.
  • 08:00:00Zheng Tang: Track 1: Multi-Camera People Tracking
    • Presentation of Track 1, focusing on multi-camera people tracking, detailing dataset statistics, evaluation metrics, and common/leading approaches.
  • 10:19:00Quan Kong: Track 2: Traffic Safety Description and Analysis
    • Overview of Track 2, which involves traffic safety description and analysis using a large-scale pedestrian-centric video dataset with multi-view and fine-grained captions.
  • 11:18:30Thomas Tang: Track 1 Award Announcement: Multi-Camera People Tracking
    • Team 79 (Shanghai Jiao Tong University & Lenovo) wins for ‘A Robust Online Multi-Camera People Tracking System With Geometric Consistency and State-aware Re-ID Correction’, and Team 221 (Yachiyo Engineering Co., Ltd., Research Institute for Infra. Paradigm Shift & Chubu University) is runner-up for ‘Overlap Suppression Clustering for Offline Multi-Camera People Tracking’.
  • 11:43:30Thomas Tang: Track 2 Award Announcement: Traffic Safety Description and Analysis
    • Team 208 (Alibaba) wins for ‘CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario’, and Team 28 (VNU HCM, FPT Telecom & VGU) is runner-up for ‘Divide and Conquer Boosting for Enhanced Traffic Safety Description and Analysis with Large Vision Language Model’.
  • 12:12:00Zheng Tang: Track 3: Naturalistic Driving Action Recognition
    • Details on Track 3, focusing on naturalistic driving action recognition using a dataset of distracted driver activities captured from multiple synchronized camera views.
  • 12:12:30Thomas Tang: Track 3 Award Announcement: Naturalistic Driving Action Recognition
    • Team 155 (TeleAI) wins, Team 5 (SKKU-AutoLab) is runner-up, and Team 165 (MCPRL) receives an honorable mention. Final ranking is based on Dataset B due to ongoing code estimation.
  • 12:41:30Thomas Tang: Track 4 Award Announcement: Road Object Detection in Fish-Eye Cameras
    • Team 9 (Vietnam Posts and Telecommunications Group (VNPT) & Phenikaa University) wins for ‘Robust Data Augmentation and Ensemble Method for Object Detection in Fisheye Camera Images’, Team 40 (Nota.ai) is runner-up for ‘Road Object Detection Robust to Distorted Objects at the Edge Regions of Images’, and Team 5 (Sungkyunkwan University) receives an honorable mention for ‘Improving Object Detection to Fisheye Cameras with Open-Vocabulary Pseudo-Label Approach’.
  • 13:16:30Thomas Tang: Track 5 Award Announcement: Detecting Violation of Helmet Rule for Motorcyclists
    • Team 99 (University of Information Technology, VNU-HCM) co-wins for ‘Robust Motorcycle Helmet Detection in Real-World Scenarios: Using Co-DETR and Minority Class Enhancement’, and Team 76 (China Mobile Shanghai ICT Co.,Ltd) co-wins for ‘An Effective Method for Detecting Violation of Helmet Rule for Motorcyclists’.
  • 13:38:30Thomas Tang: AICITY CHALLENGE 2024 WINNERS Summary
    • A summary table of all winners, runner-ups, and honorable mentions across all tracks, along with their respective NVIDIA RTX 4080 SUPER or NVIDIA Jetson Orin Nano dev kit awards.
  • 13:54:00Munkhjargal Gochoo: Track 4: Road Object Detection in Fish-Eye Cameras
    • Presentation of Track 4, covering road object detection in fish-eye cameras, detailing the dataset, object classes, and common techniques used by participants.
  • 15:39:00Pranamesh Chakraborty: Track 5: Detecting Violation of Helmet Rule for Motorcyclists
    • Overview of Track 5, which addresses detecting helmet rule violations for motorcyclists using a dataset from an Indian city under various conditions, and discusses common and leading approaches.
  • 17:54:00David C. Anastasiu: AI City Challenge Evaluation Methodology and Statistics
    • Explanation of the evaluation methodology for each track, followed by detailed statistics on team participation, submission frequency, score improvements, and leaderboard dynamics.
  • 34:05:00Daniel Cremers: Large Scale Dynamic Scene Understanding
    • Keynote speech introducing the challenges of large-scale dynamic scene understanding for autonomous systems, covering historical context, current progress in automated driving, and novel methods for 3D reconstruction and traffic acquisition.

Key Takeaways

  • The AI City Challenge has seen significant growth in participation, with a focus on real-world computer vision problems for smart cities and autonomous driving.
  • Understanding and modeling complex, dynamic human behavior in diverse traffic scenarios is crucial for the safe and effective deployment of autonomous vehicles.
  • Novel numerical methods like Power Bundle Adjustment offer significant improvements in speed, accuracy, and memory efficiency for large-scale 3D reconstruction problems, which are foundational for dynamic scene understanding.
  • Real-world data acquisition from surveillance and aerial platforms, combined with advanced 3D reconstruction and tracking, enables the creation of high-fidelity simulations and the development of robust autonomous driving systems.
  • Online multi-camera people tracking is a complex task with significant challenges including occlusion, ID switching, and re-identification across varying poses and camera angles.
  • Effective solutions often involve multi-stage pipelines combining object detection, single-camera tracking, inter-camera association, and advanced re-identification techniques.
  • Leveraging spatial-temporal constraints, hierarchical clustering, and geometric consistency can significantly improve tracking accuracy and robustness in dense and dynamic environments.
  • The AICity Challenge encourages the development of online tracking methods, with incentives for real-time applicability, and highlights the need for robust evaluation metrics for extremely long video sequences.
  • Visual prompt engineering and block expansion techniques are crucial for efficient fine-tuning of Vision Language Models in urban settings, leading to superior performance compared to traditional methods.
  • Segment-based data processing and a two-stage training pipeline, combined with dynamic combiners, can effectively generate precise and detailed descriptions of traffic safety scenarios.
  • Parallel dense video captioning, integrated with CLIP visual features, significantly enhances accuracy and efficiency in generating fine-grained captions for end-to-end event analysis in traffic videos.
  • Multi-view action recognition frameworks, utilizing advanced action recognition networks and multi-step post-processing algorithms, are effective in localizing and classifying distracted driver behaviors in untrimmed naturalistic driving videos.
  • Various advanced techniques like robust data augmentation, pseudo-labeling, and ensemble methods are crucial for achieving high accuracy in fisheye image object detection.
  • Addressing specific challenges such as distorted objects at image edges, low-light conditions, and small object sizes requires specialized approaches like SAHI, image enhancement frameworks, and super-resolution.
  • Combining multiple models and diverse data processing strategies through ensemble techniques consistently leads to improved overall performance and robustness in complex real-world scenarios.
  • A coarse-to-fine two-stage helmet detection method effectively addresses varying object sizes and complex traffic conditions for improved motorcyclist safety.
  • Data augmentation techniques, including manual and automatic copy-paste augmentation, are crucial for addressing data imbalance and improving model performance in helmet violation detection.
  • KI-GAN, a knowledge-informed GAN, integrates diverse data sources and specialized pooling methods to achieve superior accuracy in multi-vehicle trajectory forecasting at signalized intersections.
  • Simple in-place data augmentation can significantly enhance object detection performance in surveillance datasets by balancing class samples while maintaining realistic image appearance.
  • The AI City Challenge @ CVPR 2024 emphasizes the development of AI solutions for smart cities, addressing real-world problems in transportation, safety, and urban management.
  • Ethical considerations, particularly data privacy and algorithmic fairness, are paramount, with discussions highlighting the use of anonymization techniques, synthetic data, and careful data collection to mitigate risks.
  • The challenge encourages diverse approaches, including multimodal sensor fusion, human-in-the-loop systems, and the application of advanced models like Large Vision Language Models, to push the boundaries of AI capabilities.
  • Winners across five tracks were awarded NVIDIA RTX 4080 SUPER GPUs and NVIDIA Jetson Orin Nano dev kits, recognizing innovative solutions in multi-camera tracking, traffic safety, driving action recognition, fish-eye object detection, and helmet violation detection.

Methods / Models / Datasets Mentioned

  • 4D object tracking
  • Action probability calibration
  • Activity Overlap Score
  • Adaptive thresholding
  • Agglomerative Clustering
  • Anchor-feature Hierarchical Clustering
  • Average Linkage
  • BLEU-4
  • Block Expansion
  • Bot-SORT
  • ByteTrack
  • CIDEr
  • CLIP
  • CLIP Image Encoder
  • CO-DETR
  • CO-DINO model
  • Ceres-explicit
  • Ceres-implicit
  • CityLLaVA
  • Clip-level video recognition
  • Clustering
  • Co-DETR
  • CoDETR
  • Code Estimation
  • Corrective Matching Cascade
  • CycleGAN
  • DETA
  • DINO
  • DINO-4scale
  • DINO-5scale
  • DINQ
  • Data augmentation
  • Decoder Block
  • Deep Scenario
  • DeepLocalization
  • DeepSORT
  • Diffusion model
  • Distance-Aware Loss (DAL)
  • Divide and Conquer Boosting for Enhanced Traffic Safety Description and Analysis
  • Dual Aggregation Transformer (DAT)
  • Efficient Fine-Tuning for VLMs
  • EfficientDet
  • Embedding Model
  • Ensemble
  • Ensemble Models
  • F1 Score
  • F1-score
  • FJMP
  • Few-Shot Prompting
  • Fish-Eye Cameras
  • Flipping
  • GSAD
  • Gauss-Newton
  • Geometric Consistency
  • Graph Neural Networks (GNNs)
  • Graph-Based Change-Point Detection
  • H-DINO
  • HOTA
  • HOTA Score
  • Hierarchical Clustering
  • Hierarchical clustering
  • Histogram equalization
  • IDF1
  • Improving Object Detection to Fisheye Cameras with Open-Vocabulary Pseudo-Label Approach
  • Intel SceneScape
  • InternImage
  • K-fold split
  • KI-GAN
  • LLM Segment Extractor
  • LLaMA Pro
  • LLaMA2
  • LLaVA-34B
  • Large Language Model
  • Large Vision Language Model
  • Lee et al.'s Method
  • Levenberg-Marquardt
  • LoRA
  • METEOR
  • MHSA
  • MLP Projector
  • MOTA
  • Mask2Former
  • Mean Average Precision
  • Merge and Remove
  • Minority Optimizer Algorithm
  • Mosaic
  • Multi-scale training
  • Multi-view fusion
  • NAFNet
  • NVIDIA Jetson Orin Nano dev kit
  • NVIDIA RTX 4080 SUPER
  • Naturalistic Driving Action Recognition
  • Nearest Neighbor Mapping
  • OCMCTrack
  • Object detection
  • Offline Tracking
  • Online Tracking
  • Online multi-camera tracking
  • Open-Vocabulary Pseudo-Label Approach
  • OpenVINO
  • Overlap Suppression Clustering
  • PDVC (Parallel Dense Video Captioning)
  • Photogrammetry
  • Pose estimation
  • Post Connection
  • Power Bundle Adjustment (PoBA)
  • Procrustes Analysis
  • Pseudo-labeling
  • QR decomposition
  • Qwen-VL
  • RMSNorm
  • ROUGE-L
  • RTMPose
  • ReID
  • ResNet50-IBN
  • Road Object Detection Robust to Distorted Objects at the Edge Regions of Images
  • Robust Data Augmentation
  • Robust Motorcycle Helmet Detection
  • Rule-based Sentence Extractor
  • S-GAN
  • S-LSTM
  • SAHI
  • Schur complement
  • Segment Extraction Pipeline
  • Semi-supervised learning
  • Single-camera tracking
  • Slicing Aided Hyper Inference (SAHI)
  • Spatial-Temporal Mechanism
  • Square Root Bundle Adjustment (PoBA)
  • StableSR
  • State-aware Re-ID Correction
  • Super-resolution
  • SwiGLU
  • Swin-B
  • Swin-L
  • Synthetic data generation
  • TensorRT
  • Test-time augmentation
  • Text QA Construction
  • Threshold Filter
  • Trajetron++
  • UniformerV2
  • VAP-Net
  • VLM
  • Video Swin Transformer
  • Video-LLM
  • VideoLLaVA
  • VideoMAE
  • Vision Language Model (VLM)
  • Vision Language Models (VLMs)
  • Visual Prompt Engineering
  • Visual-text prompt engineering
  • Weighted Box Fusion
  • Weighted Box Fusion (WBF)
  • X3D
  • YOLO
  • YOLO-R
  • YOLO-V7
  • YOLO-W6
  • YOLO-World
  • YOLO-based Detector
  • YOLOv5
  • YOLOv6-L6
  • YOLOv7
  • YOLOv8
  • YOLOv8x
  • YOLOv9
  • YOLOv9-e
  • mAP

Topics

3D bounding box detection · 3D reconstruction · AI City Challenge · AI City Challenge Track 4 · Action Recognition · Autonomous driving · Behavior modeling · Bias · Bundle adjustment · Computer Vision · Data Annotation · Data Augmentation · Data augmentation · Deep Learning · Distorted objects · Driving Behavior Analysis · Driving action recognition · Dynamic scenes · Ensemble methods · Ethical AI · Fairness · Fisheye image object detection · Geometric consistency · Helmet Detection · Helmet violation detection · Hierarchical clustering · Human-in-the-Loop · Hybrid inference · Large-scale dynamic scene understanding · Low-light image enhancement · Multi-camera people tracking · Multimodal Analysis · Object Detection · Object detection · Online tracking · Overlap suppression clustering · Parking violation detection · Privacy · Pseudo-labeling · Road Safety · Small object detection · Smart Cities · State-aware re-ID correction · Super-resolution · Synthetic Data · Temporal Localization · Traffic Management · Traffic Safety · Traffic safety analysis · Traffic simulation · Traffic surveillance · Urban Scenarios · Video Captioning · Vision Language Models · class imbalance · data augmentation · generative adversarial networks · helmet detection · motorcyclist safety · object detection · traffic analysis · trajectory forecasting


Notes

Open for commentary — connections to other work, critiques, follow-up reading.