Microscopy, foundation models, and the scaling hypothesis (CVMI @ CVPR 2024)

Event: CVMI @ CVPR 2024 · Duration: 419 min · ▶ Watch on YouTube

Abstract

This video segment features two talks from the CVMI @ CVPR 2024 workshop. The first talk by Berton Earnshaw from Recursion and Valence Labs explores the application of microscopy, foundation models, and the scaling hypothesis to image-based profiling for drug discovery. He demonstrates how ViT Masked Autoencoders (MAEs) outperform weakly supervised learning in recalling known biological relationships from large-scale image datasets. The second talk by Stephen K. Burley, Director of the RCSB Protein Data Bank, highlights the PDB’s critical role in responding to emerging viruses, particularly coronaviruses. He showcases how structural data from the PDB was instrumental in the development of mRNA vaccines and antiviral drugs like Paxlovid for COVID-19. This segment introduces the application of self-supervised learning (SSL) in small molecule discovery, particularly for phenotypic screening. The speaker explains the typical drug discovery pipeline, emphasizing the high-throughput nature of compound screening and the use of phenotypic assays that generate image readouts. The core challenge addressed is the extraction of meaningful features from these large image datasets without relying on extensive manual labeling, which is where SSL methods become crucial. The segment then delves into specific examples of phenotypic screening at Bayer, covering both crop science and pharma divisions, and details the methodology for designing an SSL task using time-series images in herbicide screens. Finally, the speaker presents the results of applying these SSL methods, demonstrating their effectiveness in identifying meaningful biological correlations and phenotypic clusters, and discusses the performance comparison with traditional baselines. This segment features five talks on advanced computational pathology and microscopy image analysis. The first talk compares deep learning models for reproducibility and biological relevance in drug discovery. The second introduces a method for refining biologically inconsistent segmentation masks using masked autoencoders. The third presents a cell morphology-aware deep neural network for histopathological image classification. The fourth showcases a multi-tissue foundation model for PanTissue computational pathology applications. Finally, the fifth talk details an unsupervised deep temporal filter for microscopy video denoising. This segment introduces a novel approach to predicting patient treatment outcomes using generative models, particularly diffusion models and Schrödinger bridges. The speaker highlights the complexity of human biology and the limitations of current AI models in capturing this complexity, especially when dealing with highly multimodal and dynamic biological data. The methodology focuses on learning individual cell transitions and capturing cellular heterogeneity to predict responses to various treatments, even for unseen patients. The ultimate goal is to develop foundation models that learn universal cell representations from large-scale biological data, paving the way for advanced clinical decision support systems. This segment features multiple presentations and Q&A sessions from a CVPR workshop on microscopy image analysis. Topics covered include gene-level representation learning using style transfer, super-resolution of 3D microscopy volumes with 2D supervision, and weakly supervised set-consistency learning for morphological profiling of single-cell images. It also introduces a new cell tracking and mitosis detection dataset challenge, discusses interpretability metrics for deep learning models in cell painting, and presents an automated method for removing contaminating brainstem tissue from spinal cord images.

Speakers

Berton Earnshaw — Recursion Valence Labs
Cassiano Carromeu — Recursion
Stephen K. Burley — RCSB Protein Data Bank
Paula Andrea Marin Zapata — Bayer AG, Pharma R&D
Alexander Sauer — Department of Engineering, University of Oxford & Department of Cell Biology, Yale School of Medicine
Andrey Ignatov — ETH Zurich, Switzerland
Julian Viret — Paige AI
Mary Aiyetigbo — Clemson University, The Medical University of South Carolina
Charlotte Bunne — @_bunnech
Mahtab Bigverdi — University of Washington
Cheng Jiang — University of Michigan
Heming Yao — Genentech
Vivek Gopalakrishnan — MIT
Atharva Peshkar — University of Colorado Boulder
Ariana Nawaby — UT Southwestern Medical Center

Talks (16)

00:00:00 — Berton Earnshaw: Microscopy, foundation models, and the scaling hypothesis
- This talk explores Recursion’s application of microscopy, foundation models, and the scaling hypothesis to image-based profiling for drug discovery, demonstrating how ViT Masked Autoencoders outperform weakly supervised learning in recalling biological relationships from large-scale image datasets.
00:57:15 — Stephen K. Burley: Protein Data Bank: From Two Epidemics to the Global Pandemic to mRNA Vaccines and Paxlovid
- This talk details the history and crucial role of the Protein Data Bank (PDB) in understanding and combating viral pathogens, particularly coronaviruses like SARS-CoV-2, through structural biology and its instrumental role in the development of mRNA vaccines and antiviral drugs like Paxlovid for COVID-19.
01:23:45 — Paula Andrea Marin Zapata: Learning and using self-supervised phenotypic features in small molecule discovery
- The speaker introduces the topic of learning and using self-supervised phenotypic features in small molecule discovery, highlighting the challenges of traditional methods and the potential of self-supervised learning.
02:47:30 — Paula Andrea Marin Zapata: Performance comparison
- This talk compares the performance of DINO, MAE, SimCLR, and CellProfiler models across reproducibility and biological relevance metrics, highlighting DINO’s efficiency and discussing applications in mitochondrial toxicity prediction, acute oral toxicity assessment, and de novo compound design.
02:59:49 — Alexander Sauer: Refining Biologically Inconsistent Segmentation Masks with Masked Autoencoders
- This talk introduces a method to refine biologically inconsistent segmentation masks, particularly for mitochondria, by identifying uncertain regions using an ensemble model and then reconstructing them with a Masked Autoencoder (MAE) that exploits the repetitive structure of biological ultrastructures.
03:14:55 — Andrey Ignatov: Histopathological Image Classification with Cell Morphology Aware Deep Neural Networks
- This talk presents DeepCMorph, a cell morphology-aware deep neural network for histopathological image classification, which uses a segmentation module to generate nuclei and cell type annotation maps, and then feeds these along with the original image into a classification module, outperforming transformer models on various cancer datasets.
03:29:50 — Julian Viret: Journey to Virchow Foundation Model Suite
- This talk introduces the Virchow Foundation Model, a large-scale, multi-tissue model for computational pathology, demonstrating its superior performance on various cancer detection and subtyping tasks, and outlining its application in PanTissue computational pathology for detection, classification, cell segmentation, and biomarker prediction.
03:47:00 — Mary Aiyetigbo: Unsupervised Microscopy Video Denoising
- This talk addresses challenges in microscopy video denoising due to various noise types and lack of ground truth, proposing a Deep Temporal Filter (DTF) approach that leverages temporal correlation and proximity to effectively remove noise while preserving crucial signal details, outperforming existing methods.
04:11:15 — Charlotte Bunne: Predicting Patient Treatment Outcomes using Generative Models
- This talk explores the use of generative models, specifically diffusion models and Schrödinger bridges, to predict patient treatment outcomes by modeling cellular dynamics and responses to drugs, even for unseen patients and diverse patient cohorts.
05:35:00 — Charlotte Bunne: Path Toward a Virtual Cell
- A Q&A session discussing the application of optimal transport models for predicting patient responses to combination therapies and genetic perturbations, highlighting data scarcity as a major challenge.
05:40:58 — Mahtab Bigverdi: Gene-Level Representation Learning via Interventional Style Transfer in Optical Pooled Screening
- This talk introduces GRAPE, a GAN-based model that uses style transfer to learn gene-level representations from single-cell images, effectively mitigating confounding factors like batch effects and demonstrating superior performance in recapitulating biological relationships compared to existing methods.
05:53:00 — Cheng Jiang: Super-resolution of microscopy volumes with 2D supervision
- This presentation introduces MSDSR, a masked slice diffusion model that super-resolves 3D microscopy volumes using only 2D supervision, leveraging the equivalence of spatial dimensions in biomedical imaging and introducing SliceFID as a novel metric for unpaired 3D evaluation.
06:04:40 — Heming Yao: Weakly Supervised Set-Consistency Learning Improves Morphological Profiling of Single-Cell Images
- This talk presents Set-DINO, a weakly supervised method that uses set-level representation and cross-batch sampling to improve morphological profiling of single-cell images from Optical Pooled Screens, outperforming existing baselines in biological recall and robustness to confounding factors.
06:15:00 — Vivek Gopalakrishnan: Grad-CAMO Learning interpretable single-cell morphological profiles from 3D Cell Painting Images
- This presentation introduces Grad-CAMO, a new interpretability metric that quantifies the overlap between Grad-CAM attention maps and cell segmentation masks, revealing that supervised models often ‘cheat’ by focusing on background noise rather than the cell of interest in 3D cell painting images.
06:26:21 — Atharva Peshkar: CTMC: CELL TRACKING AND MITOSIS DETECTION DATASET CHALLENGE
- This presentation introduces CTMC, a new large and diverse dataset for cell tracking and mitosis detection using DIC microscopy, and announces the results of the challenge, highlighting that tracking deformable objects and maintaining object identity remain significant challenges.
06:47:00 — Ariana Nawaby: Automating Removal of Contaminating Brainstem Tissue from Volumetric Mouse Spinal Cord Serial Two-Photon Tomography Images
- This presentation details an automated method for removing contaminating brainstem tissue from mouse spinal cord images using interpolation-based data augmentation, addressing challenges of imbalanced datasets and variations in staining to improve classification accuracy for downstream analysis.

Key Takeaways

Foundation models, particularly ViT Masked Autoencoders, are crucial for extracting meaningful biological insights from large-scale microscopy image datasets, outperforming traditional weakly supervised learning methods.
The scaling hypothesis holds true in microscopy-based profiling: increasing compute, data, and model parameters leads to improved performance in recalling known biological relationships.
The Protein Data Bank (PDB) is an indispensable open-access resource for structural biology, providing critical 3D information that directly enables the rapid development of vaccines and antiviral treatments for global health crises like COVID-19.
Understanding proteome evolution and identifying highly conserved active sites in viral enzymes (like Main Protease and RNA Polymerase) are key strategies for developing effective drug discovery targets against emerging pathogens.
Self-supervised learning (SSL) is a powerful approach for extracting meaningful features from large image datasets in small molecule discovery, especially in the absence of extensive manual labels.
Phenotypic screening, which uses images as readouts, is a key component in drug discovery across various divisions like crop science and pharma.
Designing SSL tasks based on inherent data structures, such as temporal information in time-series images, can effectively capture biological relevance and identify phenotypic changes.
SSL models like DINO and MAE can match or exceed the performance of traditional methods like CellProfiler in terms of reproducibility and biological relevance, while being significantly more cost-effective for inference.
Careful consideration of data preprocessing steps, such as plate normalization and feature selection, is crucial for optimizing the performance of SSL models in phenotypic screening.
Deep learning models like DINO and MAE show strong performance in image analysis for biological applications, often outperforming traditional methods like CellProfiler.
Foundation models trained on large, diverse datasets can generalize effectively across various tissue types and tasks in computational pathology, reducing the need for extensive fine-tuning.
Incorporating biological constraints and leveraging inherent data structures (e.g., repetitive patterns in mitochondria, cell morphology in histopathology) can significantly improve model accuracy and interpretability.
Unsupervised learning techniques, particularly those exploiting temporal correlations in video data, are crucial for denoising microscopy images where ground truth is often unavailable.
The development of multi-modal and multi-tissue foundation models is transforming computational pathology, enabling more efficient workflows and broader applications from cancer detection to biomarker prediction.
Generative models, especially diffusion models and Schrödinger bridges, offer a powerful framework for understanding and predicting complex biological processes, such as cellular responses to drug treatments.
The proposed methodology, including 2.5D multiple-instance learning and methods for reconstructing dynamics from partial alignments, allows for the prediction of treatment outcomes for individual cells and unseen patients, leveraging contextual information along the depth axis of 3D pathology data.
Integrating multi-modal biological data and developing foundation models that learn universal cell representations from large-scale datasets are crucial steps towards building robust clinical decision support systems and advancing personalized medicine.
The complexity of human biology, characterized by indirect, obscured, noisy, and highly multimodal data, necessitates specialized AI architectures and approaches tailored to capture dynamic and interactive biological systems.
Data scarcity, especially for combination therapies and 3D biological imaging, remains a significant challenge for developing and validating advanced machine learning models in microscopy image analysis.
Novel methods like GRAPE and MSDSR leverage techniques such as style transfer and conditional diffusion to address challenges like confounding factors and limited 3D data, showing promising results in biological relevance and super-resolution.
Interpretability metrics like Grad-CAMO are crucial for auditing supervised models, as they can reveal instances where models ‘cheat’ by focusing on irrelevant background information rather than the intended features.
Data augmentation strategies, including intra-sample and inter-sample interpolation, can effectively combat dataset imbalance and sparsity, leading to improved classification performance in biomedical image analysis tasks.

Methods / Models / Datasets Mentioned

AdaIN
Adaptive BatchNorm
AlphaFold DB
BNS dataset
BioGPT
Biomarker Prediction
Blindspot techniques
CNN
CONCH
CPA
CRISPR-Cas9
CTransPath
Cell Detection
Cell Segmentation
CellOT
CellProfiler
Cellpose
Channel dropout
Channel-Agnostic ViT MAE
Colorectal Cancer (CRC) Dataset
CondOT
CryoNuSeg dataset
CutMix
DINO
Deep Temporal Filter (DTF)
DeepCMorph
DeepL
DeepProfiler
DenseNet
Diffusion models
EfficientNet-B0
EfficientNet-B7
Fourier domain reconstruction loss
GANs
GISAID
GRAPE
GSBFlow
Gene2vec
Gigapath
GoogLeNet
Grad-CAM
Grad-CAMO
IDFI
IMPA
Inception-v3
JkoNet
KNN
KNN classification
KNN classifiers
KUMAR dataset
Lizard dataset
MAE
MICCAI dataset
MOTA
MSDSR
Masked Autoencoder (MAE)
MoNuSAC dataset
Model Distillation (DINO)
ModelArchive
Molnupiravir
Morgan fingerprints
NCT-CRC-HE Dataset
PRISM
Pan-Cancer Detection
Pan-Cancer TCGA Dataset
PanMSK
PanNuke Dataset
PanTissue Computational Pathology
Paxlovid
PixMix augmentation
Pycytominer
QSAR models
RDRF
RVRT
Random
Random Forest
Remdesivir
ResNet-18
Rosetta Energetics
SBAlign
Shuffled CellProfiler
SimCLR
SliceFID
Sphering
StarGAN v2
TNBC dataset
TVN
Temporal Signal Filtering
Transfer Learning
Transformer autoregressive model (DALL-E)
UBDSB
UDVD
UMAP
UNI (ViT-Large foundation model)
VATIC
VGG16
VGG19
ViT
Virchow Foundation Model
Virchow2
Vision Transformer (ViT)
Z-score
Z-score robust
iBOT
mRNA-1273
scDINO

Topics

3D imaging · AI in Medicine · Biomarker Prediction · Biomedical Data · COVID-19 · Cell tracking · Cellular Dynamics · Computational Pathology · Data augmentation · Deep Learning Models · Diffusion Models · Drug Discovery · Drug Repurposing · Foundation Models · Gene-level representation learning · Generative Models · Image Classification · Image Segmentation · Interpretability metrics · Microscopy · Microscopy Image Analysis · Microscopy image analysis · Mitochondrial Toxicity · Mitosis detection · Personalized Medicine · Protein Data Bank · Proteome Evolution · Scaling Hypothesis · Schrödinger Bridges · Self-supervised Learning · Spinal cord analysis · Structural Biology · Style transfer · Super-resolution · Treatment Prediction · Vaccine Design · ViT Masked Autoencoders · Video Denoising · biological correlations · drug discovery pipeline · herbicide screens · image analysis · phenotypic clusters · phenotypic screening · self-supervised learning · small molecule discovery

Notes

Open for commentary — connections to other work, critiques, follow-up reading.