Foundation Models in Radiology

Event: CVPR 2025 Workshop on Foundation Models · Duration: 0 min · ▶ Watch on YouTube

Abstract

The talk discusses the application of foundation models in radiology, highlighting both the hype and the practical successes, particularly in image acquisition and reconstruction. It explores opportunities for AI across the entire imaging workflow, from image acquisition to diagnosis and prediction. The speaker presents a framework for building foundation models for cardiac MRI using self-supervised learning on unlabeled data from the UK Biobank, demonstrating how learned representations can be leveraged for downstream tasks like segmentation and phenotype prediction. Furthermore, the talk delves into multimodal approaches, integrating non-imaging information and even cross-modality learning (ECG and MRI) to enhance diagnostic potential. Finally, it touches upon the emerging role of large language models (LLMs) in medicine, discussing their current capabilities and limitations in real-world clinical scenarios, and proposes a specialist vision LLM for ophthalmology as a promising direction.

Speakers

  • Daniel Rueckert — Technical University of Munich (TUM) and Imperial College London

Talks (1)

  • 00:00 — Daniel Rueckert: Foundation Models in Radiology
    • This talk explores the application of foundation models in radiology, covering their impact on image acquisition, reconstruction, and advanced diagnostic tasks, while also discussing multimodal approaches and the role of large language models in medicine.

Key Takeaways

  • AI has already made significant, often unnoticed, contributions to radiology, particularly in image acquisition and reconstruction, forming the foundation for future advancements.
  • Foundation models can be effectively built for medical imaging by leveraging large unlabeled datasets and self-supervised learning techniques, enabling robust representations for various downstream tasks.
  • Integrating multimodal data (imaging, tabular, and other modalities like ECG) significantly enhances the diagnostic and predictive power of foundation models, leading to more comprehensive patient insights.
  • Large Language Models (LLMs) show great promise in medicine for natural language interaction, but current evaluation benchmarks (like USMLE) may not fully reflect the complexities and nuances of real-world clinical scenarios.
  • Future directions involve developing specialized vision-language models for specific medical domains (e.g., ophthalmology) that can provide interpretable insights and support clinical decision-making by combining visual and textual information.

Methods / Models / Datasets Mentioned

  • Masked Autoencoder
  • SSL Encoder
  • SSL Decoder
  • CLIP Loss
  • ResNet-50
  • ViTA
  • nnUNet
  • UNETR+
  • SimCLR
  • CLOCS
  • BYOL
  • BarlowTwins
  • MDM (Masked Data Modelling)
  • MMCL (Multimodal Contrastive Learning)
  • DOTS (Domain-Aware Time Series Pre-Training)
  • GPT-3.5
  • Med-PaLM
  • Med-PaLM 2
  • GPT-4
  • Med-Gemini
  • Llama 3
  • RetinaVLM
  • StableDiffusion

Topics

Foundation Models · Radiology · AI in Medicine · Cardiac MRI · Self-Supervised Learning · Multimodal Learning · ECG · Time Series Data · Large Language Models (LLMs) · Ophthalmology


Notes

Open for commentary — connections to other work, critiques, follow-up reading.