THE 8TH AI CITY CHALLENGE @ CVPR 2024
Event: CVPR 2024 · Duration: 508 min · ▶ Watch on YouTube
Abstract
This segment introduces the 8th AI City Challenge at CVPR 2024, detailing its tracks, participation statistics, and evaluation methodologies. It then transitions into a keynote speech on Large Scale Dynamic Scene Understanding, delivered by Professor Daniel Cremers. The keynote covers the historical development of 3D computer vision for autonomous systems, the challenges of real-world dynamic scenes, and novel numerical methods for bundle adjustment, concluding with applications in traffic acquisition and simulation. This segment features a series of paper presentations from the AICity Challenge 2024, focusing on Track 1 and Track 2. The presentations cover various approaches to multi-camera people tracking, including methods for handling occlusion, improving re-identification, and enhancing tracking accuracy in complex urban environments. Key topics include online tracking, spatial-temporal constraints, and the use of Vision Language Models for traffic safety analysis. The segment concludes with a brief Q&A and a call for the next set of presentations. This segment features a series of presentations on various methods for analyzing driving behavior and traffic safety using advanced computer vision and language models. Topics include efficient fine-tuning of Vision Language Models in urban settings, segment-based data processing for traffic safety analysis, parallel dense video captioning, multi-perspective traffic video description, multi-view action recognition for distracted driving, and spatial-temporal learning for unusual driving behaviors. The presentations highlight innovative approaches to data annotation, model architecture, and post-processing techniques to improve accuracy and efficiency in identifying and classifying driving activities. This segment features multiple presentations from teams participating in the AI City Challenge Track 4, focusing on road object detection in fisheye camera images. Each team presents their unique methodology, addressing challenges such as image distortion, low-light conditions, small object detection, and data inconsistency. The solutions leverage various techniques including advanced data augmentation, pseudo-labeling, image enhancement, super-resolution, and ensemble methods, demonstrating innovative approaches to improve detection accuracy and robustness in complex traffic surveillance scenarios. This segment features multiple presentations on advanced computer vision techniques for traffic analysis and safety. Topics include a coarse-to-fine two-stage helmet detection method for motorcyclists, an effective method for detecting helmet rule violations, and a knowledge-informed generative adversarial network for multi-vehicle trajectory forecasting at signalized intersections. Additionally, a simple in-place data augmentation technique for surveillance object detection is presented, highlighting innovative approaches to address challenges like data imbalance, complex traffic conditions, and occlusions in real-world scenarios. This segment features a dynamic panel discussion on the 8th AI City Challenge @ CVPR 2024, delving into the mission of leveraging AI for smart cities, the critical role of data augmentation, and the complex ethical landscape surrounding privacy and fairness in AI applications. Panelists explore strategies for anonymizing data, mitigating biases, and utilizing synthetic data and multimodal analysis to advance the field. The segment concludes with the highly anticipated announcement of award winners for all five tracks, recognizing top teams for their innovative solutions in areas such as multi-camera people tracking, traffic safety analysis, naturalistic driving action recognition, road object detection in fish-eye cameras, and helmet violation detection.
Speakers
- Zheng Tang — NVIDIA
- David C. Anastasiu — Santa Clara University
- Quan Kong — Woven by Toyota
- Munkhjargal Gochoo — The United Arab Emirates University
- Pranamesh Chakraborty — Indian Institute of Technology Kanpur
- Daniel Cremers — Chair of Computer Vision and AI, TU Munich; Munich Center for Machine Learning
- Thomas Yang
- Ryuto Yoshida — Yachiyo Engineering Co., Ltd.
- Zhenyu Xie — Shanghai Jiao Tong University
- Jeongho Kim — Nota Inc., Republic of Korea
- Andreas Specker — Fraunhofer IOSB
- Riu Cherdchusakitchai — AI and Robotics Ventures (ARV), Thailand
- Zhihao Duan — Alibaba OpenTrek
- Thomas Tang — Alibaba Cloud Intelligence Group
- Viet Hung Duong — VNPT AI, Vietnam Posts and Telecommunications Group, Hanoi, Vietnam
- Duc Quyen Nguyen — VNPT AI, Vietnam Posts and Telecommunications Group, Hanoi, Vietnam
- Tien Cuong Nguyen — VNPT AI, Vietnam Posts and Telecommunications Group, Hanoi, Vietnam
- Thien Van Luong — Faculty of Computer Science, Phenikaa University, Hanoi, Vietnam
- Huan Vu — University of Transport and Communications, Hanoi, Vietnam
- Wooksu Shin — Nota Inc., Republic of Korea
- Donghyuk Choi — Nota Inc., Republic of Korea
- Hancheol Park — Nota Inc., Republic of Korea
- Long Hoang Pham — SKKU-AutoLab, Sungkyunkwan University
- Quoc Pham-Nam Ho — SKKU-AutoLab, Sungkyunkwan University
- Duong Nguyen-Ngoc Tran — SKKU-AutoLab, Sungkyunkwan University
- Tai Huu-Phuong Tran — SKKU-AutoLab, Sungkyunkwan University
- Huy-Hung Nguyen — SKKU-AutoLab, Sungkyunkwan University
- Duong Khac Vu — SKKU-AutoLab, Sungkyunkwan University
- Ngoc Doan-Minh Huynh — SKKU-AutoLab, Sungkyunkwan University
- Hyung-Min Jeon — SKKU-AutoLab, Sungkyunkwan University
- Hyung-Joon Jeon — SKKU-AutoLab, Sungkyunkwan University
- Jae Wook Jeon — SKKU-AutoLab, Sungkyunkwan University
- Bao Tran Gia — University of Information Technology, VNU-HCM, Vietnam
- Tuong Bui Cong Khanh — University of Information Technology, VNU-HCM, Vietnam
- Hien Ho Trong — University of Information Technology, VNU-HCM, Vietnam
- Thuyen Tran Doan — University of Information Technology, VNU-HCM, Vietnam
- Tien Do — University of Information Technology, VNU-HCM, Vietnam
- Duy-Dinh Le — University of Information Technology, VNU-HCM, Vietnam
- Thanh Duc Ngo — University of Information Technology, VNU-HCM, Vietnam
- Dai Quoc Tran — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
- Armstrong Aboah — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
- Yuntae Jeon — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
- Maged Shoman — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
- Minsoo Park — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
- Seunghee Park — Smart Construction IT Lab (SCIT), Sungkyunkwan University, Korea
- Xingshuang Luo — Beijing University of Posts and Telecommunications
- Zhe Cui — Beijing University of Posts and Telecommunications
- Fei Su — Beijing University of Posts and Telecommunications
- Pranamash
- Hongpu Zhang — Beijing University of Posts and Telecommunications
- Yunliang Chen — China Mobile Shanghai ICT Co., Ltd
- Chen Wang — China Mobile Shanghai ICT Co., Ltd
- Yingda Shang — China Mobile Shanghai ICT Co., Ltd
- Chuheng Wei — University of California, Riverside
- Guoyuan Wu — University of California, Riverside
- Matthew Barth — University of California, Riverside
- Amr Abdelraouf — Infotech Labs, Toyota
- Rohit Gupta — Infotech Labs, Toyota
- Kyungtae Han — Infotech Labs, Toyota
- Munkh-Erdene Otgonbold — United Arab Emirates University
- Ganzorig Batnasan — United Arab Emirates University
- Norimasa Kobori — Yachiyo Engineering Co., Ltd.
- Xunlei Wu — Department of Computer Science and Software Engineering, United Arab Emirates University, UAE
Talks (40)
- 00:00:00 — Zheng Tang: The 8th AI City Challenge @ CVPR 2024 - Introduction
- An introduction to the 8th AI City Challenge at CVPR 2024, covering the organizing committees, historical overview of the challenge, participation statistics, and timeline.
- 01:24:42 — Thomas Yang: Q&A Session with Daniel Cremers
- This segment begins with a Q&A session for Daniel Cremers, covering topics such as 3D bounding box detection, behavior modeling, and handling dynamic scenes in tracking.
- 01:35:54 — Ryuto Yoshida: Overlap Suppression Clustering for Offline Multi-Camera People Tracking
- Ryuto Yoshida presents a multi-camera people tracking method that achieved the highest HOTA score in the challenge, focusing on overlap suppression clustering and hierarchical clustering with average linkage.
- 01:46:41 — Zhenyu Xie: A Robust Online Multi-Camera People Tracking System With Geometric Consistency and State-aware Re-ID Correction
- Zhenyu Xie presents a robust online multi-camera people tracking system that addresses challenges like matching individuals across cameras, ID switching due to occlusion, and re-identification with varying poses and angles, using geometric consistency and state-aware re-ID correction.
- 01:56:11 — Jeongho Kim: Cluster Self-Refinement for Enhanced Online Multi-Camera People Tracking
- Jeongho Kim introduces a cluster self-refinement module for online multi-camera people tracking, addressing issues of inaccurate mapping, low-quality features, and multiple IDs assigned to a single person, leading to improved HOTA scores.
- 02:06:21 — Andreas Specker: OCMCTrack: Online Multi-Target Multi-Camera Tracking with Corrective Matching Cascade
- Andreas Specker presents OCMCTrack, an online multi-target multi-camera tracking system that focuses on correcting erroneous associations from previous time steps and improving bounding box to world projection, achieving competitive results in the challenge.
- 02:16:18 — Riu Cherdchusakitchai: Online Multi-camera People Tracking with Spatial-temporal Mechanism and Anchor-feature Hierarchical Clustering
- Riu Cherdchusakitchai presents an online multi-camera people tracking method that utilizes spatial-temporal constraints and anchor-feature hierarchical clustering to address challenges like occlusion, ID switching, and varying lighting conditions, achieving a HOTA score of 69.10%.
- 02:27:42 — Zhihao Duan: CityLLaVA: Efficient Fine-tuning for VLMs in City Scenario
- Zhihao Duan presents CityLLaVA, an efficient fine-tuning pipeline for Vision Language Models (VLMs) specifically designed for urban scenarios, addressing challenges like small targets, multi-views, background noise, and templated captions to improve traffic safety description and analysis.
- 02:49:24 — Thomas Tang: CityLLaVA: Efficient Fine-tuning for VLMs in City Scenario
- This talk introduces CityLLaVA, an efficient fine-tuning method for Vision Language Models in urban settings, highlighting visual prompt engineering, text QA construction, and block expansion techniques.
- 03:43:17 — Thomas Tang: Divide and Conquer Boosting for Enhanced Traffic Safety Description and Analysis with Large Vision Language Model
- This talk presents a method for traffic safety analysis using a large vision language model, focusing on segment-based data processing and a two-stage training pipeline to generate detailed descriptions of traffic scenarios.
- 04:14:18 — Viet Hung Duong: ROBUST DATA AUGMENTATION AND ENSEMBLE METHOD FOR OBJECT DETECTION IN FISHEYE IMAGES
- Presents a solution for fisheye image object detection using robust data augmentation, K-fold splitting, synthetic data generation, model selection, pseudo-labeling, and ensembling, achieving 1st place in the AI City Challenge Track 4.
- 04:14:47 — Wooksu Shin: Road Object Detection Robust to Distorted Objects at the Edge Regions of Images
- Addresses challenges in fisheye image object detection, particularly with distorted objects at image edges, using task-specific methods like slicing aided hyper inference and semi-supervised learning, general methods, and ensemble techniques.
- 04:15:05 — Long Hoang Pham: Improving Object Detection to Fisheye Cameras with Open-Vocabulary Pseudo-Label Approach
- Proposes an open-vocabulary pseudo-label approach for improving object detection in fisheye cameras, addressing data inconsistency and distortion challenges using YOLO-World and CycleGAN.
- 04:15:20 — Bao Tran Gia: Enhancing Road Object Detection in Fisheye Cameras: An Effective Framework Integrating SAHI and Hybrid Inference
- Presents an effective framework for enhancing road object detection in fisheye cameras by integrating SAHI and hybrid inference, utilizing data preparation, augmentation, various models, and post-processing strategies.
- 04:15:31 — Dai Quoc Tran: Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets
- Introduces a low-light image enhancement framework for improved object detection in fisheye lens datasets, focusing on addressing challenges posed by low-light conditions and small object detection.
- 04:15:43 — Xingshuang Luo: FE-Det: An Effective Traffic Object Detection Framework for Fish-Eye Cameras
- Proposes FE-Det, an effective traffic object detection framework for fisheye cameras, addressing challenges like distortion, tiny objects, and similar classes through a comprehensive methodology including data augmentation, model architecture, and post-processing.
- 05:18:14 — Thomas Tang: Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis
- This talk introduces a parallel dense video captioning method integrated with CLIP visual features for enhanced accuracy and efficiency in traffic video analysis, focusing on fine-grained captions and end-to-end event analysis.
- 05:38:48 — Thomas Tang: None
- The speaker introduces the next session of presentations.
- 05:38:57 — Hongpu Zhang: A Coarse-to-fine Two-stage Helmet Detection Method for Motorcyclists
- This presentation introduces a coarse-to-fine two-stage helmet detection method for motorcyclists, addressing challenges like varying object sizes and complex traffic conditions in real-world scenarios.
- 05:41:18 — Yunliang Chen: An Effective Method for Detecting Violation of Helmet Rule for Motorcyclists
- This presentation introduces an effective method for detecting helmet rule violations in motorcyclists, utilizing a two-stage object detection framework with data augmentation and ensemble techniques to address challenges like data imbalance and occlusion.
- 05:42:49 — Chuheng Wei: KI-GAN: Knowledge-Informed Generative Adversarial Networks for Enhanced Multi-Vehicle Trajectory Forecasting at Signalized Intersections
- This presentation introduces KI-GAN, a knowledge-informed generative adversarial network for multi-vehicle trajectory forecasting at signalized intersections, addressing research gaps in traffic light influence, data integration, and interaction pooling.
- 05:44:55 — Munkh-Erdene Otgonbold: Simple In-place Data Augmentation for Surveillance Object Detection
- This presentation introduces a simple in-place data augmentation technique for surveillance object detection, aiming to improve road object detection performance by balancing class samples and preserving realistic image appearance.
- 05:48:14 — Thomas Tang: Multi-perspective Traffic Video Description Model with Fine-grained Refinement Approach
- This talk presents a multi-perspective traffic video description model that utilizes fine-grained refinement for generating detailed descriptions of traffic scenarios, focusing on textual and visual attribute extraction, video captioning, and a refinement module.
- 06:18:52 — Thomas Tang: Multi-View Action Recognition for Distracted Driver Behavior Localization
- This talk introduces a multi-view action recognition framework for localizing distracted driver behaviors, utilizing a two-stage approach with action recognition networks and a temporal localization module to identify and classify driving activities.
- 06:33:14 — Thomas Tang: Multi-View Spatial-Temporal Learning for Understanding Unusual Behaviors in Untrimmed Naturalistic Driving Videos
- This talk presents a multi-view spatial-temporal learning approach for understanding unusual behaviors in untrimmed naturalistic driving videos, focusing on a two-stage framework with action recognition models and a multi-step post-processing algorithm to detect and classify driving activities.
- 06:41:14 — Thomas Tang: DeepLocalization: Using change point detection for Temporal Action Localization
- This talk introduces DeepLocalization, a real-time localization framework for driver actions using deep learning techniques and graph-based change-point detection to accurately determine driver actions’ start and end times.
- 07:08:30 — Thomas Tang: Panel Discussion: AI City Challenge Mission, Data Augmentation, Privacy, Fairness, and Future Directions
- Panelists discuss the AI City Challenge’s mission to improve smart cities, the role of data augmentation, ethical concerns like privacy and bias, and future directions including multimodal analysis and control systems.
- 08:00:00 — Zheng Tang: Track 1: Multi-Camera People Tracking
- Presentation of Track 1, focusing on multi-camera people tracking, detailing dataset statistics, evaluation metrics, and common/leading approaches.
- 10:19:00 — Quan Kong: Track 2: Traffic Safety Description and Analysis
- Overview of Track 2, which involves traffic safety description and analysis using a large-scale pedestrian-centric video dataset with multi-view and fine-grained captions.
- 11:18:30 — Thomas Tang: Track 1 Award Announcement: Multi-Camera People Tracking
- Team 79 (Shanghai Jiao Tong University & Lenovo) wins for ‘A Robust Online Multi-Camera People Tracking System With Geometric Consistency and State-aware Re-ID Correction’, and Team 221 (Yachiyo Engineering Co., Ltd., Research Institute for Infra. Paradigm Shift & Chubu University) is runner-up for ‘Overlap Suppression Clustering for Offline Multi-Camera People Tracking’.
- 11:43:30 — Thomas Tang: Track 2 Award Announcement: Traffic Safety Description and Analysis
- Team 208 (Alibaba) wins for ‘CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario’, and Team 28 (VNU HCM, FPT Telecom & VGU) is runner-up for ‘Divide and Conquer Boosting for Enhanced Traffic Safety Description and Analysis with Large Vision Language Model’.
- 12:12:00 — Zheng Tang: Track 3: Naturalistic Driving Action Recognition
- Details on Track 3, focusing on naturalistic driving action recognition using a dataset of distracted driver activities captured from multiple synchronized camera views.
- 12:12:30 — Thomas Tang: Track 3 Award Announcement: Naturalistic Driving Action Recognition
- Team 155 (TeleAI) wins, Team 5 (SKKU-AutoLab) is runner-up, and Team 165 (MCPRL) receives an honorable mention. Final ranking is based on Dataset B due to ongoing code estimation.
- 12:41:30 — Thomas Tang: Track 4 Award Announcement: Road Object Detection in Fish-Eye Cameras
- Team 9 (Vietnam Posts and Telecommunications Group (VNPT) & Phenikaa University) wins for ‘Robust Data Augmentation and Ensemble Method for Object Detection in Fisheye Camera Images’, Team 40 (Nota.ai) is runner-up for ‘Road Object Detection Robust to Distorted Objects at the Edge Regions of Images’, and Team 5 (Sungkyunkwan University) receives an honorable mention for ‘Improving Object Detection to Fisheye Cameras with Open-Vocabulary Pseudo-Label Approach’.
- 13:16:30 — Thomas Tang: Track 5 Award Announcement: Detecting Violation of Helmet Rule for Motorcyclists
- Team 99 (University of Information Technology, VNU-HCM) co-wins for ‘Robust Motorcycle Helmet Detection in Real-World Scenarios: Using Co-DETR and Minority Class Enhancement’, and Team 76 (China Mobile Shanghai ICT Co.,Ltd) co-wins for ‘An Effective Method for Detecting Violation of Helmet Rule for Motorcyclists’.
- 13:38:30 — Thomas Tang: AICITY CHALLENGE 2024 WINNERS Summary
- A summary table of all winners, runner-ups, and honorable mentions across all tracks, along with their respective NVIDIA RTX 4080 SUPER or NVIDIA Jetson Orin Nano dev kit awards.
- 13:54:00 — Munkhjargal Gochoo: Track 4: Road Object Detection in Fish-Eye Cameras
- Presentation of Track 4, covering road object detection in fish-eye cameras, detailing the dataset, object classes, and common techniques used by participants.
- 15:39:00 — Pranamesh Chakraborty: Track 5: Detecting Violation of Helmet Rule for Motorcyclists
- Overview of Track 5, which addresses detecting helmet rule violations for motorcyclists using a dataset from an Indian city under various conditions, and discusses common and leading approaches.
- 17:54:00 — David C. Anastasiu: AI City Challenge Evaluation Methodology and Statistics
- Explanation of the evaluation methodology for each track, followed by detailed statistics on team participation, submission frequency, score improvements, and leaderboard dynamics.
- 34:05:00 — Daniel Cremers: Large Scale Dynamic Scene Understanding
- Keynote speech introducing the challenges of large-scale dynamic scene understanding for autonomous systems, covering historical context, current progress in automated driving, and novel methods for 3D reconstruction and traffic acquisition.
Key Takeaways
- The AI City Challenge has seen significant growth in participation, with a focus on real-world computer vision problems for smart cities and autonomous driving.
- Understanding and modeling complex, dynamic human behavior in diverse traffic scenarios is crucial for the safe and effective deployment of autonomous vehicles.
- Novel numerical methods like Power Bundle Adjustment offer significant improvements in speed, accuracy, and memory efficiency for large-scale 3D reconstruction problems, which are foundational for dynamic scene understanding.
- Real-world data acquisition from surveillance and aerial platforms, combined with advanced 3D reconstruction and tracking, enables the creation of high-fidelity simulations and the development of robust autonomous driving systems.
- Online multi-camera people tracking is a complex task with significant challenges including occlusion, ID switching, and re-identification across varying poses and camera angles.
- Effective solutions often involve multi-stage pipelines combining object detection, single-camera tracking, inter-camera association, and advanced re-identification techniques.
- Leveraging spatial-temporal constraints, hierarchical clustering, and geometric consistency can significantly improve tracking accuracy and robustness in dense and dynamic environments.
- The AICity Challenge encourages the development of online tracking methods, with incentives for real-time applicability, and highlights the need for robust evaluation metrics for extremely long video sequences.
- Visual prompt engineering and block expansion techniques are crucial for efficient fine-tuning of Vision Language Models in urban settings, leading to superior performance compared to traditional methods.
- Segment-based data processing and a two-stage training pipeline, combined with dynamic combiners, can effectively generate precise and detailed descriptions of traffic safety scenarios.
- Parallel dense video captioning, integrated with CLIP visual features, significantly enhances accuracy and efficiency in generating fine-grained captions for end-to-end event analysis in traffic videos.
- Multi-view action recognition frameworks, utilizing advanced action recognition networks and multi-step post-processing algorithms, are effective in localizing and classifying distracted driver behaviors in untrimmed naturalistic driving videos.
- Various advanced techniques like robust data augmentation, pseudo-labeling, and ensemble methods are crucial for achieving high accuracy in fisheye image object detection.
- Addressing specific challenges such as distorted objects at image edges, low-light conditions, and small object sizes requires specialized approaches like SAHI, image enhancement frameworks, and super-resolution.
- Combining multiple models and diverse data processing strategies through ensemble techniques consistently leads to improved overall performance and robustness in complex real-world scenarios.
- A coarse-to-fine two-stage helmet detection method effectively addresses varying object sizes and complex traffic conditions for improved motorcyclist safety.
- Data augmentation techniques, including manual and automatic copy-paste augmentation, are crucial for addressing data imbalance and improving model performance in helmet violation detection.
- KI-GAN, a knowledge-informed GAN, integrates diverse data sources and specialized pooling methods to achieve superior accuracy in multi-vehicle trajectory forecasting at signalized intersections.
- Simple in-place data augmentation can significantly enhance object detection performance in surveillance datasets by balancing class samples while maintaining realistic image appearance.
- The AI City Challenge @ CVPR 2024 emphasizes the development of AI solutions for smart cities, addressing real-world problems in transportation, safety, and urban management.
- Ethical considerations, particularly data privacy and algorithmic fairness, are paramount, with discussions highlighting the use of anonymization techniques, synthetic data, and careful data collection to mitigate risks.
- The challenge encourages diverse approaches, including multimodal sensor fusion, human-in-the-loop systems, and the application of advanced models like Large Vision Language Models, to push the boundaries of AI capabilities.
- Winners across five tracks were awarded NVIDIA RTX 4080 SUPER GPUs and NVIDIA Jetson Orin Nano dev kits, recognizing innovative solutions in multi-camera tracking, traffic safety, driving action recognition, fish-eye object detection, and helmet violation detection.
Methods / Models / Datasets Mentioned
4D object trackingAction probability calibrationActivity Overlap ScoreAdaptive thresholdingAgglomerative ClusteringAnchor-feature Hierarchical ClusteringAverage LinkageBLEU-4Block ExpansionBot-SORTByteTrackCIDErCLIPCLIP Image EncoderCO-DETRCO-DINO modelCeres-explicitCeres-implicitCityLLaVAClip-level video recognitionClusteringCo-DETRCoDETRCode EstimationCorrective Matching CascadeCycleGANDETADINODINO-4scaleDINO-5scaleDINQData augmentationDecoder BlockDeep ScenarioDeepLocalizationDeepSORTDiffusion modelDistance-Aware Loss (DAL)Divide and Conquer Boosting for Enhanced Traffic Safety Description and AnalysisDual Aggregation Transformer (DAT)Efficient Fine-Tuning for VLMsEfficientDetEmbedding ModelEnsembleEnsemble ModelsF1 ScoreF1-scoreFJMPFew-Shot PromptingFish-Eye CamerasFlippingGSADGauss-NewtonGeometric ConsistencyGraph Neural Networks (GNNs)Graph-Based Change-Point DetectionH-DINOHOTAHOTA ScoreHierarchical ClusteringHierarchical clusteringHistogram equalizationIDF1Improving Object Detection to Fisheye Cameras with Open-Vocabulary Pseudo-Label ApproachIntel SceneScapeInternImageK-fold splitKI-GANLLM Segment ExtractorLLaMA ProLLaMA2LLaVA-34BLarge Language ModelLarge Vision Language ModelLee et al.'s MethodLevenberg-MarquardtLoRAMETEORMHSAMLP ProjectorMOTAMask2FormerMean Average PrecisionMerge and RemoveMinority Optimizer AlgorithmMosaicMulti-scale trainingMulti-view fusionNAFNetNVIDIA Jetson Orin Nano dev kitNVIDIA RTX 4080 SUPERNaturalistic Driving Action RecognitionNearest Neighbor MappingOCMCTrackObject detectionOffline TrackingOnline TrackingOnline multi-camera trackingOpen-Vocabulary Pseudo-Label ApproachOpenVINOOverlap Suppression ClusteringPDVC (Parallel Dense Video Captioning)PhotogrammetryPose estimationPost ConnectionPower Bundle Adjustment (PoBA)Procrustes AnalysisPseudo-labelingQR decompositionQwen-VLRMSNormROUGE-LRTMPoseReIDResNet50-IBNRoad Object Detection Robust to Distorted Objects at the Edge Regions of ImagesRobust Data AugmentationRobust Motorcycle Helmet DetectionRule-based Sentence ExtractorS-GANS-LSTMSAHISchur complementSegment Extraction PipelineSemi-supervised learningSingle-camera trackingSlicing Aided Hyper Inference (SAHI)Spatial-Temporal MechanismSquare Root Bundle Adjustment (PoBA)StableSRState-aware Re-ID CorrectionSuper-resolutionSwiGLUSwin-BSwin-LSynthetic data generationTensorRTTest-time augmentationText QA ConstructionThreshold FilterTrajetron++UniformerV2VAP-NetVLMVideo Swin TransformerVideo-LLMVideoLLaVAVideoMAEVision Language Model (VLM)Vision Language Models (VLMs)Visual Prompt EngineeringVisual-text prompt engineeringWeighted Box FusionWeighted Box Fusion (WBF)X3DYOLOYOLO-RYOLO-V7YOLO-W6YOLO-WorldYOLO-based DetectorYOLOv5YOLOv6-L6YOLOv7YOLOv8YOLOv8xYOLOv9YOLOv9-emAP
Topics
3D bounding box detection · 3D reconstruction · AI City Challenge · AI City Challenge Track 4 · Action Recognition · Autonomous driving · Behavior modeling · Bias · Bundle adjustment · Computer Vision · Data Annotation · Data Augmentation · Data augmentation · Deep Learning · Distorted objects · Driving Behavior Analysis · Driving action recognition · Dynamic scenes · Ensemble methods · Ethical AI · Fairness · Fisheye image object detection · Geometric consistency · Helmet Detection · Helmet violation detection · Hierarchical clustering · Human-in-the-Loop · Hybrid inference · Large-scale dynamic scene understanding · Low-light image enhancement · Multi-camera people tracking · Multimodal Analysis · Object Detection · Object detection · Online tracking · Overlap suppression clustering · Parking violation detection · Privacy · Pseudo-labeling · Road Safety · Small object detection · Smart Cities · State-aware re-ID correction · Super-resolution · Synthetic Data · Temporal Localization · Traffic Management · Traffic Safety · Traffic safety analysis · Traffic simulation · Traffic surveillance · Urban Scenarios · Video Captioning · Vision Language Models · class imbalance · data augmentation · generative adversarial networks · helmet detection · motorcyclist safety · object detection · traffic analysis · trajectory forecasting
Notes
Open for commentary — connections to other work, critiques, follow-up reading.