CVPR 2024 Workshop on Computer Vision for Physiological Measurement

Event: CVPR 2024 · Duration: 251 min · ▶ Watch on YouTube

Abstract

This segment features three presentations from the CVPR 2024 workshop on Computer Vision for Physiological Measurement. The first talk evaluates GPT-4V’s performance in visual affective computing tasks, demonstrating its capabilities in areas like action unit detection and emotion recognition. The second presentation introduces DECNet, a dual-modality network for driver emotion classification, which leverages facial video and driving behavior for improved monitoring. The final talk details a pipeline for decoding provider visual attention during neonatal resuscitation using eye-tracking and vision-language models, aiming to enhance medical training and decision-making. The segment concludes with the announcement of the best paper award for DECNet.

Speakers

  • Jiaqi Tang — The Hong Kong University of Science and Technology (Guangzhou)
  • Chenhao Hu — Hangzhou Dianzi University
  • Philip Brey — University of Pennsylvania

Talks (3)

  • 02:47:08Jiaqi Tang: GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing
    • This talk evaluates the performance of GPT-4V in various visual affective computing tasks, including action unit detection, emotion recognition, micro-expression/gesture recognition, and deception detection, highlighting its capabilities and limitations.
  • 02:47:37Chenhao Hu: DECNet: A Non-Contacting Dual-Modality Emotion Classification Network for Driver Health Monitoring
    • This talk introduces DECNet, a dual-modality emotion classification network that uses facial video and driving behavior to monitor driver health, demonstrating superior performance in various emotion recognition tasks.
  • 02:48:02Philip Brey: Decoding Provider Visual Attention Using the NAI Tool for Neonatal Resuscitation
    • This talk presents a pipeline to automatically decode provider visual attention during neonatal resuscitation using eye-tracking data and vision-language models, aiming to improve training and decision support.

Key Takeaways

  • GPT-4V shows promising capabilities in various visual affective computing tasks, including complex emotion recognition and micro-expression analysis, but faces challenges in providing detailed answers for micro-expressions.
  • Dual-modality approaches, combining visual and behavioral data, can significantly improve emotion classification performance in real-world applications like driver health monitoring.
  • Automated analysis of provider visual attention using eye-tracking and advanced AI models can offer valuable insights for medical training and decision support in high-stakes environments like neonatal resuscitation.
  • Data scarcity remains a significant challenge in applying deep learning to medical contexts, highlighting the potential of zero-shot capabilities and pre-trained models.

Methods / Models / Datasets Mentioned

  • Ablation Analysis
  • Action Unit Detection
  • Adapter Module
  • CLIP
  • DECNet
  • Deception Detection
  • Driving Behavior Processing Module
  • Emotion Recognition
  • Eye-tracking technology
  • Facial Video Processing Module
  • Fully Connected Network
  • Fusion Decision Module
  • GPT-4V
  • Global Average Pooling
  • Hyperparameter Optimization
  • Image Classification Models
  • Inception Unit
  • Micro-expression Recognition
  • Micro-gesture Recognition
  • MobileSAM
  • Multi-task Learning
  • NAI Tool
  • Single-task Learning
  • Spatial Transformer
  • Temporal Transformer
  • Tool Call and Processing
  • Vision-Language Models (VLMs)

Topics

Deep Learning Applications in Healthcare · Driver Emotion Classification · Eye-tracking · Human-Computer Interaction · Multimodal Large Language Models · Neonatal Resuscitation · Vision-Language Models · Visual Affective Computing


Notes

Open for commentary — connections to other work, critiques, follow-up reading.