DEF-AI-MIA Workshop at CVPR 2024

Event: CVPR 2024 · Duration: 306 min · ▶ Watch on YouTube

Abstract

This video segment features the opening remarks and several presentations from the DEF-AI-MIA Workshop at CVPR 2024. The workshop focuses on domain adaptation, explainability, and fairness in AI for medical image analysis, including a competition on COVID-19 diagnosis from CT scans. Talks cover various topics such as unsupervised domain adaptation for histology and skin lesion diagnosis, source-free domain adaptation for object localization, multi-scale interpretable deep learning for mammography, zero-shot medical image segmentation, and automatic biomarker extraction from medical images. This video segment presents a series of short talks from the DEF-AI-MIA CVPR 2024 Workshop, focusing on various applications of AI in medical imaging. Topics covered include enhancing cell segmentation with uncertainty-informed active learning, developing interpretable deep learning models for mass margin classification in mammography, utilizing complex style image transformations for domain generalization, and creating prototype-based interpretable networks for glaucoma detection. Additionally, the segment explores the interpretation of COVID-19 lateral flow tests using foundation models, introduces an efficient transformer for 3D medical image segmentation, and proposes a multiple instance learning framework for robust medical diagnosis. This segment explores multimodal AI data fusion in healthcare, emphasizing the integration of diverse data types like medical records, scans, and genomics for improved diagnostic and prognostic predictions. It introduces two novel techniques: the Multi-modal Outer Arithmetic Block (MOAB) and Flattened Outer Arithmetic Attention (FOAA). MOAB uses bilinear fusion with arithmetic operations to intermingle features, demonstrating enhanced separation of brain tumor grades. FOAA extends these concepts to attention mechanisms, achieving superior quantitative results on both brain tumor and breast tumor datasets compared to existing methods. This segment introduces the Interactive Medical Image Learning (IMIL) Framework, a novel approach to medical image analysis that leverages targeted clinician feedback to improve model performance and interpretability. It also presents a novel approach using residual-based language models as ‘free boosters’ for biomedical imaging tasks, demonstrating their effectiveness in improving performance across various medical image analysis challenges. Additionally, it introduces LaPA, a Latent Prompt Assist Model for Medical Visual Question Answering, designed to improve the accuracy and interpretability of medical image analysis by leveraging latent prompts and multi-modal fusion. Finally, it presents a novel approach to fine-grained medical activity recognition in trauma resuscitation using actor tracking, aiming to improve the accuracy and efficiency of monitoring and decision-making in critical medical scenarios.

Speakers

Dimitrios Kollias — NTUA
Ruby Wood — University of Oxford
Janet Wang — Tulane University
Alexis Guichemerre — ETS Montreal
Julia Yang — Duke University
Sidra Aleem — Dublin City University
Ronald M. Summers — NIH
Greg Slabaugh — Professor of Computer Vision and AI, Director of the Digital Environment Research Institute (DERI) at Queen Mary University of London
Bob Zhang

Talks (20)

00:00:00 — Dimitrios Kollias: Domain adaptation, Explainability, Fairness in AI for Medical Image Analysis (DEF-AI-MIA) Workshop
- Introduction to the DEF-AI-MIA workshop, its scope, aims, and competition challenges, including thanks to sponsors and introduction of organizers.
00:15:45 — Ruby Wood: Cluster Triplet Loss for Unsupervised Domain Adaptation on Histology Images
- Presents a method using cluster triplet loss for unsupervised domain adaptation on histology images to predict patient response to radiotherapy.
00:21:55 — Janet Wang: Achieving Reliable and Fair Skin Lesion Diagnosis via Unsupervised Domain Adaptation
- Presents a study on investigating the effectiveness of unsupervised domain adaptation (UDA) for training skin lesion classifiers with various public datasets, especially when labeled data from the target set is unavailable, and improving fairness.
00:25:00 — Alexis Guichemerre: Source-free Domain Adaptation of Weakly-supervised Object Localization Models for Histology
- Discusses source-free domain adaptation (SFDA) methods in the context of weakly-supervised object localization (WSOL) for histology images, exploring different SFDA techniques and their performance.
00:30:50 — Julia Yang: FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model for Classification of Mass Margins in Digital Mammography
- Introduces FPN-IAIA-BL, a multi-scale interpretable deep learning model designed to classify mass margins in digital mammography, focusing on interpretability and localization at different scales.
00:30:50 — Sidra Aleem: Test-Time Adaptation with SALIP: A Cascade of SAM and CLIP for Zero-Shot Medical Image Segmentation
- Presents a method called SALIP that combines SAM and CLIP models for zero-shot medical image segmentation, focusing on test-time adaptation.
00:31:15 — Ronald M. Summers: Automatic Extraction of Biomarkers Through Deep Learning and Explainable Disease Diagnosis
- Discusses the automatic extraction of biomarkers from medical images using deep learning, emphasizing explainability for disease diagnosis and its application in large-scale body composition analysis.
01:16:55 — David Anglada-Rotger: Enhancing Ki-67 Cell Segmentation with Dual U-Net Models: A Step Towards Uncertainty-Informed Active Learning
- This talk presents a dual U-Net model for Ki-67 cell segmentation that incorporates uncertainty-informed active learning to improve performance.
02:31:30 — Julia Yang, Alina Jade Barnett, Jon Donnelly, Satvik Kishore, Jerry Fang, Fides Regina Schwartz, Chaofan Chen, Joseph Y. Lo, Cynthia Rudin: FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model for Classification of Mass Margins in Digital Mammography
- This talk introduces a multi-scale interpretable deep learning model (FPN-IAIA-BL) for classifying mass margins in digital mammography.
02:33:00 — Greg Slabaugh: Multimodal AI in Healthcare: Attention, Salience, Global/local analysis
- Introduction to multimodal data in healthcare and the need for AI data fusion.
03:11:30 — Nikolaos Spanos, Anastasios Arsenos, Paraskevi-Antonia Theofilou, Paraskevi Tzouveli, Athanasios Voulodimos, Stefanos Kollias: Complex Style Image Transformations for Domain Generalization in Medical Images
- This talk explores complex style image transformations within an augmentation framework to improve domain generalization in medical image analysis.
03:42:30 — Mohana Singh, BS Vivek, Jayavardhana Gubbi, Arpan Pal: Prototype-based Interpretable Network for Glaucoma Detection
- This talk proposes a prototype-based interpretable network for glaucoma detection, focusing on learning class-specific prototypes.
03:49:30 — Dimitrios Kollias: Interactive Medical Image Learning (IMIL) Framework
- This talk introduces the Interactive Medical Image Learning (IMIL) Framework, a novel approach to medical image analysis that leverages targeted clinician feedback to improve model performance and interpretability.
03:55:10 — Bob Zhang: Residual-based Language Models are Free Boosters for Biomedical Imaging Tasks
- This talk presents a novel approach using residual-based language models as ‘free boosters’ for biomedical imaging tasks, demonstrating their effectiveness in improving performance across various medical image analysis challenges.
04:00:29 — Tiancheng Gu: LaPA: Latent Prompt Assist Model For Medical Visual Question Answering
- This talk introduces LaPA, a Latent Prompt Assist Model for Medical Visual Question Answering, designed to improve the accuracy and interpretability of medical image analysis by leveraging latent prompts and multi-modal fusion.
04:05:15 — Wenjin Zhang: Focusing on What Matters: Fine-grained Medical Activity Recognition for Trauma Resuscitation via Actor Tracking
- This talk presents a novel approach to fine-grained medical activity recognition in trauma resuscitation using actor tracking, aiming to improve the accuracy and efficiency of monitoring and decision-making in critical medical scenarios.
04:07:30 — Stuti Pandey, Josh Myers-Dean, Jarek Reynolds, Danna Gurari: Interpreting COVID Lateral Flow Tests’ Results with Foundation Models
- This talk investigates the use of modern foundation models for interpreting COVID-19 lateral flow test results, focusing on identifying and grounding test components.
04:57:30 — Jakub Laszczyk, Mohamed: Using counterfactual information for breast classification diagnosis
- This talk explores the use of counterfactual information to improve breast classification diagnosis.
06:32:30 — Shehan Perera, Pouyan Navard, Alper Yilmaz: SegFormer3D: An Efficient Transformer for 3D Medical Image Segmentation
- This talk introduces SegFormer3D, a lightweight and efficient transformer architecture for 3D medical image segmentation.
06:57:30 — D. J. Araújo, M. R. Verdelho, A. Bissoto, J. C. Nascimento, C. Santiago, C. Barata: Key Patches Are All You Need: A Multiple Instance Learning Framework For Robust Medical Diagnosis
- This talk proposes a multiple instance learning framework that uses key patches for robust medical diagnosis, addressing spurious correlations in attention maps.

Key Takeaways

The DEF-AI-MIA workshop addresses critical challenges in applying AI to medical imaging, focusing on robustness, interpretability, and ethical considerations like fairness.
Various novel approaches are presented, including unsupervised domain adaptation techniques leveraging clustering and contrastive learning, and multi-scale interpretable models for specific medical tasks.
The importance of explainability and localization in medical AI is highlighted, especially for clinical adoption and trust.
Large-scale body composition analysis using automated AI tools on CT scans shows promise for predicting cardiovascular risk and overall survival.
Uncertainty-informed active learning can significantly enhance cell segmentation performance in medical imaging.
Interpretable deep learning models are crucial for clinical adoption, especially in complex tasks like mass margin classification.
Domain generalization techniques, including style transformations and augmentation, are vital for robust AI models across diverse medical datasets.
Foundation models show promise in interpreting medical test results, but challenges remain in ensuring accuracy and explainability.
Multimodal data fusion is crucial in healthcare to leverage correlations and complementarity across diverse data types for improved diagnostic and prognostic predictions.
The Multi-modal Outer Arithmetic Block (MOAB) is a novel bilinear fusion technique that effectively intermingles unimodal and multimodal features, leading to better data separation and classification.
Flattened Outer Arithmetic Attention (FOAA) extends MOAB’s arithmetic operations into an attention mechanism, achieving state-of-the-art performance in multimodal medical image analysis tasks.
These fusion techniques show promise in complex medical problems like brain tumor grading and rheumatoid arthritis, where traditional single-modality approaches may neglect crucial clinical context.
The IMIL Framework significantly improves model accuracy and calibration by incorporating clinician feedback, demonstrating a 4% increase in accuracy with only 4% of the dataset augmented.
Residual-based language models can act as ‘free boosters’ for biomedical imaging tasks, enhancing performance across various medical image analysis challenges.
The LaPA model, utilizing latent prompts and multi-modal fusion, shows exceptional performance in medical visual question answering, outperforming state-of-the-art methods.
Fine-grained medical activity recognition in trauma resuscitation can be effectively achieved through actor tracking, leading to more clinically relevant attention and calibrated confidence in AI models.

Methods / Models / Datasets Mentioned

ADDA (Adversarial Discriminative Domain Adaptation)
Ablation-CAM
Active Learning
AdaDSA
Agatston Score
Augmentation
BMI (Body Mass Index)
CAM (Class Activation Map)
CDCL (Cross-Domain Contrastive Learning)
CLIP (Contrastive Language-Image Pre-training)
CNN
CNN-based patch encoders
Cluster Triplet Loss
ConvNeXt
CutMix
CutOut
DANN (Domain Adversarial Neural Network)
DeepMHL
Distill-SODA
ERM (Empirical Risk Minimization)
FOAA
FPN (Feature Pyramid Network)
FPN-IAIA-BL
FRS (Framingham Risk Score)
Foundation Models
Grad-CAM
Grad-CAM++
Graph Neural Network
Grounding
IAIA-BL (Interpretable AI Algorithm for Breast Lesions)
Image Captioning
K-Means
KF
LaPA
LayerCAM
M2F
M3SDA (Moment Matching for Multi-Source Domain Adaptation)
MDAN (Multi-Source Domain Adversarial Networks)
MLP
MMD (Maximum Mean Discrepancy)
MOAB
MixUp
Multi-scale deep learning
MultiCoFusion
Multiple Instance Learning (MIL)
Pathomic
Prototype-based learning
RNN
ResNet 3D
ResNet-50
SAM (Segment Anything Model)
SFDA (Source-Free Domain Adaptation)
SHOT
SRDC
SS-CAM
Score-CAM
SegFormer3D
Style Transfer
TS-CAM
Transformers
Triplet Centre Loss
U-Net
UMAP
Video Swin Transformer
Video-MAE
Vision Transformers (ViT)
WSOL (Weakly-Supervised Object Localization)
YOLOv8
t-SNE

Topics

Active Learning · Activity Recognition · Attention Mechanisms · Bilinear Fusion · Biomarker Extraction · Biomedical Imaging · Body Composition Analysis · Brain Tumor Grading · COVID-19 Diagnosis · Clinician Feedback · Data Fusion · Deep Learning · Domain Adaptation · Domain Generalization · Explainable AI · Fairness in AI · Foundation Models · Glaucoma Detection · Healthcare AI · Histology Image Analysis · Interpretability · Interpretable AI · Language Models · Mammography · Medical Image Analysis · Model Performance · Multimodal AI · Precision Medicine · Rheumatoid Arthritis · Segmentation · Skin Lesion Diagnosis · Trauma Resuscitation · Visual Question Answering · Weakly-supervised Object Localization · Zero-shot Segmentation

Notes

Open for commentary — connections to other work, critiques, follow-up reading.