CVPR 2024 Workshop on Computer Vision for Physiological Measurement
Event: CVPR 2024 · Duration: 251 min · ▶ Watch on YouTube
Abstract
This segment features three presentations from the CVPR 2024 workshop on Computer Vision for Physiological Measurement. The first talk evaluates GPT-4V’s performance in visual affective computing tasks, demonstrating its capabilities in areas like action unit detection and emotion recognition. The second presentation introduces DECNet, a dual-modality network for driver emotion classification, which leverages facial video and driving behavior for improved monitoring. The final talk details a pipeline for decoding provider visual attention during neonatal resuscitation using eye-tracking and vision-language models, aiming to enhance medical training and decision-making. The segment concludes with the announcement of the best paper award for DECNet.
Speakers
- Jiaqi Tang — The Hong Kong University of Science and Technology (Guangzhou)
- Chenhao Hu — Hangzhou Dianzi University
- Philip Brey — University of Pennsylvania
Talks (3)
- 02:47:08 — Jiaqi Tang: GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing
- This talk evaluates the performance of GPT-4V in various visual affective computing tasks, including action unit detection, emotion recognition, micro-expression/gesture recognition, and deception detection, highlighting its capabilities and limitations.
- 02:47:37 — Chenhao Hu: DECNet: A Non-Contacting Dual-Modality Emotion Classification Network for Driver Health Monitoring
- This talk introduces DECNet, a dual-modality emotion classification network that uses facial video and driving behavior to monitor driver health, demonstrating superior performance in various emotion recognition tasks.
- 02:48:02 — Philip Brey: Decoding Provider Visual Attention Using the NAI Tool for Neonatal Resuscitation
- This talk presents a pipeline to automatically decode provider visual attention during neonatal resuscitation using eye-tracking data and vision-language models, aiming to improve training and decision support.
Key Takeaways
- GPT-4V shows promising capabilities in various visual affective computing tasks, including complex emotion recognition and micro-expression analysis, but faces challenges in providing detailed answers for micro-expressions.
- Dual-modality approaches, combining visual and behavioral data, can significantly improve emotion classification performance in real-world applications like driver health monitoring.
- Automated analysis of provider visual attention using eye-tracking and advanced AI models can offer valuable insights for medical training and decision support in high-stakes environments like neonatal resuscitation.
- Data scarcity remains a significant challenge in applying deep learning to medical contexts, highlighting the potential of zero-shot capabilities and pre-trained models.
Methods / Models / Datasets Mentioned
Ablation AnalysisAction Unit DetectionAdapter ModuleCLIPDECNetDeception DetectionDriving Behavior Processing ModuleEmotion RecognitionEye-tracking technologyFacial Video Processing ModuleFully Connected NetworkFusion Decision ModuleGPT-4VGlobal Average PoolingHyperparameter OptimizationImage Classification ModelsInception UnitMicro-expression RecognitionMicro-gesture RecognitionMobileSAMMulti-task LearningNAI ToolSingle-task LearningSpatial TransformerTemporal TransformerTool Call and ProcessingVision-Language Models (VLMs)
Topics
Deep Learning Applications in Healthcare · Driver Emotion Classification · Eye-tracking · Human-Computer Interaction · Multimodal Large Language Models · Neonatal Resuscitation · Vision-Language Models · Visual Affective Computing
Notes
Open for commentary — connections to other work, critiques, follow-up reading.