CVsports Workshop at CVPR 2024, Seattle
Event: CVPR 2024 · Duration: 411 min · ▶ Watch on YouTube
Abstract
This video segment covers the opening remarks and initial presentations of the CVsports Workshop at CVPR 2024. It begins with an introduction to the workshop’s purpose, organizers, and key statistics, highlighting its 10th anniversary. The program for the day is outlined, including oral sessions, poster presentations, and special talks. The segment then features a detailed historical overview of CVsports, its motivations, and a statistical breakdown of its growth over the years, followed by a presentation on scalable AI solutions for sports analytics in downmarket leagues and youth sports, and a presentation on a neuro-symbolic approach for action quality assessment. This segment concludes a presentation on a Neuro-Symbolic Action Quality Assessment (NS-AQA) system for diving, showcasing its state-of-the-art performance, expert validation, and broad applicability. The speaker highlights the system’s objectivity, transparency, and explainability, contrasting it with purely neural approaches. The segment also includes a Q&A session where the speaker discusses the system’s robustness, expert acceptance, and the philosophical implications of combining symbolic and neural methods. Following this, the segment introduces a new talk on video interaction recognition using an Attention Augmented Relational Network (AARN) and skeleton data, specifically for classifying penalties in hockey games. This segment features three distinct presentations from the CVPR 2024 Workshop. The first part discusses challenges in sports analytics, specifically regarding object detection and tracking in basketball, highlighting issues like occlusions and the limitations of single-view systems. The second presentation introduces an event-based approach for ball spin estimation in sports, leveraging event cameras to overcome motion blur and achieve high temporal resolution. The final presentation details X-VARS, a system that uses multi-modal large language models to generate explainable decisions for football refereeing, demonstrating performance comparable to human referees and the potential for future support in sports officiating. This segment begins with an introduction to the CVPR 2024 Workshop and its ‘Program of the day’. The primary presentation features Mirko Messori from Sphaera, who introduces their ground-truth cloud-based infrastructure, specifically highlighting its application in the Kings League. He covers Sphaera’s technology overview, including on-site data acquisition using commercial security cameras, cloud-based data elaboration, and the development of custom event annotation tools to meet the unique demands of the Kings League, emphasizing cost optimization and the need for automated solutions. This segment summarizes the results and winning solutions for the SoccerNet 2024 challenges, including Dense Video Captioning, Multi-View Foul Recognition, and Game State Reconstruction. It also announces the Best Paper Award winners for CVsports 2024. The segment highlights innovative approaches, model architectures, and key contributions from the top-performing teams, emphasizing the collaborative and open-source nature of the SoccerNet initiative.
Speakers
- Anthony Cioppa
- Silvio Giancola
- Rikke Gade — Aalborg University
- Mehran Javan — Sportlogiq
- Lauren Okamoto — Princeton University
- Lauren W. — Princeton University
- James J. Clark — McGill University
- Takuya Nakabayashi — Keio University, Kyoto Higa (NEC Corporation), Masahiro Yamaguchi (NEC Corporation), Ryo Fujiwara (NEC Corporation), Hideo Saito (Keio University)
- Jan Held — fnrs, LIÈGE université, KAUST, IVUL
- Jerrin Bright — Vision and Image Processing Lab, Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada
- Mirko Messori — Sphaera
- Jamshid Tursunboev — Kyungpook National University
- Jing Zhang — Xidian University
- Vladimir Golovkin — Constructor.tech
- Kyota Higa — NEC Corporation
- Masahiro Yamaguchi — NEC Corporation
- Ryo Fujiwara — NEC Corporation
- Hideo Saito — Keio University
- Hani Itani — KAUST
- Bernard Ghanem — KAUST
- Marc Van Droogenbroeck — University of Liège
- Paritosh Parmar — Agency for Science, Technology and Research
Talks (16)
- 00:00:00 — Anthony Cioppa: Welcome to CVsports’24
- Anthony Cioppa welcomes attendees to the 10th edition of the CVsports workshop, introduces the organizers, and outlines the workshop’s purpose and key statistics.
- 00:08:14 — Rikke Gade: 10th anniversary CVsports
- Rikke Gade provides a historical overview of the CVsports workshop, detailing its evolution over 10 years, motivations for its creation, and key statistics on papers and organizers.
- 00:33:28 — Mehran Javan: Scalable AI Solutions for Downmarket Leagues and Challenges in Youth Sports
- Mehran Javan discusses Sportlogiq’s scalable AI solutions for sports analytics, focusing on data acquisition challenges and opportunities in downmarket leagues and youth sports, and their approach to generating complete data from partial observations using generative AI.
- 01:14:29 — Lauren Okamoto: NS-AQA: Hierarchical Neuro-Symbolic Approach for Comprehensive and Explainable Action Quality Assessment
- Lauren Okamoto presents a Neuro-Symbolic AI approach for Action Quality Assessment (AQA) in competitive platform diving, demonstrating how it provides interpretable, transparent, and data-efficient scoring by abstracting video into symbols and applying rule-based programs.
- 01:22:06 — Lauren W.: NS-AQA: Hierarchical Neuro-Symbolic Approach for Comprehensive and Explainable Action Quality Assessment
- This segment concludes a presentation on a Neuro-Symbolic Action Quality Assessment (NS-AQA) system for diving, showcasing its state-of-the-art performance, expert validation, and broad applicability, followed by a Q&A session.
- 02:30:06 — James J. Clark: Video Interaction Recognition using an Attention Augmented Relational Network and Skeleton Data
- This segment introduces a method for video interaction recognition using an Attention Augmented Relational Network (AARN) and skeleton data, focusing on classifying penalties in hockey games.
- 03:44:11 — Takuya Nakabayashi: Event-based Ball Spin Estimation in Sports
- This talk presents a novel approach for estimating ball spin in sports using event cameras, which offer high temporal resolution and dynamic range, overcoming limitations of traditional frame-based methods like motion blur. The method leverages Contrast Maximization and 3D projection to accurately estimate spin even at high speeds, validated on synthetic and real-world volleyball data.
- 04:02:11 — Jan Held: X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Models
- This presentation introduces X-VARS, a system that uses multi-modal large language models to generate explainable decisions for football refereeing. It leverages a custom dataset of video-question-answer triplets annotated by referees and integrates visual (fine-tuned CLIP) and textual information to provide decisions and justifications, achieving performance comparable to human referees.
- 04:25:11 — Jerrin Bright: PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics
- This talk presents PitcherNet, a framework for detailed 3D analysis of pitcher dynamics from monocular broadcast baseball feeds. It addresses challenges like motion blur and occlusion, utilizing 3D human modeling (SMPL, D2A-HMR) and a role classification network to extract kinematic information and derive key pitch metrics like position, velocity, and extension.
- 04:54:48 — Mirko Messori: Ground-truth cloud-based infrastructure: The Kings League use case
- Mirko Messori presents Sphaera’s cloud-based infrastructure for football performance analysis, detailing hardware, data acquisition, and custom event annotation tools for the Kings League.
- 05:28:30 — Silvio Giancola: Dense Video Captioning
- Introduces the Dense Video Captioning task, sponsored by Quality, which involves generating anchored commentaries for football games from 500 broadcast games.
- 05:28:55 — Jamshid Tursunboev: DeLTA Lab solution for SoccerNet 2024 Dense video captioning task
- Presents DeLTA Lab’s solution for Dense Video Captioning, featuring a visual encoder (4x Transformer decoder) and a GPT-2 language model, both fully fine-tuned end-to-end, achieving a METEOR@30 score of 27.42.
- 06:28:30 — Jing Zhang: Multi-View Foul Recognition
- Presents WJB’s method for Multi-View Foul Recognition, utilizing a Vit-B backbone, pretraining with ‘unmasked teacher’, max pooling, and MLP classification, achieving a 44.76 combined metric.
- 06:28:45 — Silvio Giancola: Game State Reconstruction
- Introduces the Game State Reconstruction task, sponsored by Sportradar, which aims to reconstruct game states from broadcast videos, outputting minimaps with player details (jersey, role, team) from 200 30-second clips.
- 06:29:00 — Vladimir Golovkin: Football multi-object tracking system
- Presents Constructor.tech’s multi-object tracking system for Game State Reconstruction, detailing a pipeline of team detection (YOLOv5, OSNet), raw tracking (DeepSORT), and post-processing, achieving a 62.69 GS-HOTA score.
- 06:29:35 — Lauren Okamoto: Hierarchical NeuroSymbolic Approach for Action Quality Assessment
- Announces ‘Hierarchical NeuroSymbolic Approach for Action Quality Assessment’ by Lauren Okamoto et al. as the first-place winner for the Best Paper Award.
Key Takeaways
- The CVsports Workshop at CVPR 2024 marks its 10th anniversary, demonstrating significant growth in submissions and participation, fostering collaboration between academia and industry in computer vision and AI for sports.
- Scalable AI solutions are being developed to bring advanced analytics to downmarket leagues and youth sports, addressing challenges like data incompleteness, varied input distributions, and the need for affordable, high-quality data acquisition.
- A novel Neuro-Symbolic AI approach for Action Quality Assessment (AQA) offers interpretable, transparent, and data-efficient scoring in sports like diving, by abstracting video into symbols and applying rule-based programs to identify and quantify performance errors.
- Multi-stream processing and fusion, combining data from various sources like visual streams, trajectory data, game clocks, audio, and body pose, are crucial for improving accuracy in game segmentation and event detection, especially in challenging downmarket video environments.
- The NS-AQA system achieves state-of-the-art performance in temporal segmentation and fine-grained action recognition without requiring training data.
- Domain experts, including an Olympic dive coach and a certified judge, validated the NS-AQA system’s scores and found its detailed reports helpful for training and safety.
- A qualitative comparison showed that purely neural models can be biased towards certain phases of an action, while the NS-AQA provides a more balanced and explainable assessment.
- The proposed AARN method for video interaction recognition uses skeleton poses as a concise and effective feature, alleviating scene biases and reducing privacy concerns, and is designed to solve relational reasoning problems efficiently.
- Event cameras offer significant advantages over traditional frame-based methods for high-speed motion analysis in sports, particularly for tasks like ball spin estimation.
- AI models, especially those leveraging large language models, can be trained to not only make decisions but also provide human-understandable explanations, potentially aiding human experts in subjective tasks like sports officiating.
- Extracting 3D human pose and kinematic data from monocular broadcast feeds presents challenges but is crucial for detailed sports analysis, with specialized models and datasets being developed to address these.
- The development of robust datasets, including annotated video-question-answer triplets and 3D pose data, is fundamental for advancing AI applications in sports.
- Sphaera offers a ground-truth cloud-based infrastructure for football performance analysis, utilizing commercial security cameras and AWS services for cost-effective data acquisition and processing.
- The Kings League, a new football format, requires advanced, customized tools for event annotation due to its unique rules and fast-paced nature, driving the need for specialized software.
- The backend infrastructure is entirely cloud-based, enabling flexibility and scalability to adapt to various client needs without requiring new hardware investments.
- There is a significant opportunity for semi-automated solutions in sports data analysis to reduce manual tagging efforts and improve efficiency, especially for complex events and large datasets.
- SoccerNet 2024 challenges saw significant participation and innovative solutions across various computer vision tasks in sports.
- Winning teams leveraged advanced architectures like Transformer decoders, GPT-2, Vit-B, and YOLOv5/v8, often with end-to-end fine-tuning and multi-task learning strategies.
- The Best Paper Award recognized research in event-based ball spin estimation, explainable AI for refereeing, and hierarchical neurosymbolic approaches for action quality assessment.
- The SoccerNet initiative fosters a large, international, and open-source community, promoting collaboration and advancing research in sports analytics.
Methods / Models / Datasets Mentioned
AIAWSAmplifyAngularAttention Augmented Relational Network (AARN)BLIPC3D-MTLCLIPComputer VisionContrast Maximization (CMax)D2A-HMRDeep LearningDeepSORTDeepStreamDynamoDBEC2FineDiving datasetGCNGPT-2 base/mediumGenerative AIHRNetHawk-Eye Camera SystemI3DLSTMMLPNS-AQANeuro-Symbolic AIOSNetRelational Reasoning Network (RN)Resnet18Resnet50S3SMPLSegFormerTCNTensorRTTransformerTransformer decoderVit-BWaveNetWideResnetYOLOYOLOv5YOLOv8custom segformermax poolingunmasked teacher
Topics
3D Trajectories · 3D human modeling · AI in Sports · Academia · Accepted Papers · Action Quality Assessment · Action Quality Assessment (AQA) · Ball Spin Estimation · Ball spin estimation · Baseball video analytics · Best Paper Award · CVPR 2024 · CVsports Workshop · Cloud Infrastructure · Collaboration · Computer Vision in Sports · Cost Optimization · Data Acquisition · Data Efficiency · Data Inaccuracy · Data Incompleteness · Deep Learning · Deep Learning Models · Dense Video Captioning · Diving Analysis · Downmarket Leagues · Event Annotation · Event Detection · Event cameras · Expert Feedback · Explainable AI · Explainable AI in Refereeing · Fine-Grained Action Recognition · Flexibility · Football Analytics · Football Performance Analysis · Football refereeing · Future of CVsports · Game Events · Game State Reconstruction · Generative AI · Happy Hour · History of CVsports · Hockey Analytics · Hockey Penalty Classification · Human Bias · Industry · Insights Generation · Institutions · Interpretability · Kings League · Large Language Models (LLMs) · Manual Annotation · Media · Motion blur mitigation · Multi-Object Tracking · Multi-Stream Processing · Multi-View Foul Recognition · Multi-camera Systems · Neural Networks · Neuro-Symbolic AI · Object detection and tracking · Oral Papers · Organizers · Paper Submissions · Pelvis-to-Thorax Vector · Performance Errors · Pitcher kinematics · Platform Diving · Player Tracking · Pose Estimation · Program Schedule · Relational Reasoning Network · Rule-Based Programs · Scalable AI Solutions · Single-camera Feed Processing · Skeleton Pose Estimation · SoccerNet Challenge · Somersault Counter · Sports Analytics · Sports Industry · Sports Types · Sports analytics · Symbol Abstraction · Team Activities · Temporal Segmentation · Top Contributors · Topics in CVsports · Transparency · Video Interaction Recognition · Wearables · Workshop Statistics · Youth Sports
Notes
Open for commentary — connections to other work, critiques, follow-up reading.