Geospatial Computer Vision and Machine Learning for Large-Scale Earth Observation Data

Event: CVPR Tutorial · Duration: 177 min · ▶ Watch on YouTube

Abstract

This tutorial provides a comprehensive overview of geospatial computer vision and machine learning, specifically tailored for Earth Observation (EO) data. It covers the fundamental characteristics and inherent challenges of EO data, ranging from diverse sensor modalities to varying spatial and temporal resolutions. The speakers delve into various applications in geoscience, common machine learning tasks, and the emerging field of foundational models for EO. A hands-on session demonstrates practical data access and processing using Python tools for Landsat imagery. The tutorial also addresses critical aspects such as data complexity, cost barriers, ethical considerations, and future research directions in EO applications.

Speakers

Orhun Aydin — Saint Louis University
Felipe Dias — Oak Ridge National Lab

Talks (6)

00:00:00 — Orhun Aydin, Felipe Dias: Welcome & Opening Statements
- Orhun Aydin and Felipe Dias introduce themselves and the tutorial on Geospatial Computer Vision and Machine Learning for Large-Scale Earth Observation Data, outlining the day’s schedule and emphasizing the importance of Earth Observation data and its applications.
00:41:59 — Felipe Dias: Introduction to Earth Observation Data
- Felipe Dias defines Earth Observation (EO) and remote sensing, highlighting their use in mapping physical environments, monitoring land cover, and aiding disaster response. He discusses the diverse modalities of EO data, including optical and radar imagery, and the challenges posed by varying spatial, temporal, and spectral resolutions.
01:59:59 — Orhun Aydin: Hands-On EO Data I/O & Wrangling
- Orhun Aydin guides participants through a hands-on session using a Colab notebook to access and process Landsat data. He covers satellite sensor data collection, EO terminology, the Worldwide Reference System, querying the STAC catalog, and calculating/plotting NDVI.
02:59:59 — Felipe Dias: Common ML Tasks & Foundational Models
- Felipe Dias discusses common machine learning tasks in Earth Observation (EO) imagery, such as image classification, object detection, and semantic segmentation. He introduces the concept of foundational models for EO, highlighting self-supervised learning strategies and their application to multimodal data.
03:29:59 — Orhun Aydin: Spatially Explicit Unsupervised Learning
- Orhun Aydin delves into spatially explicit unsupervised learning, focusing on regionalization problems in EO data. He explains graph-based representations of spatial data, tree-based partitioning algorithms, and how to create spatially contiguous clusters by optimizing within-cluster variance and maximizing between-cluster variance.
03:59:59 — Felipe Dias, Orhun Aydin: Some Challenges & Opportunities
- The speakers discuss the beneficial uses and malicious risks of AI in Earth Observation, including data quality improvement, super-resolution, deepfake detection, and dataset poisoning. They highlight the need for robust documentation, improved literacy, and addressing biases in AI models, emphasizing the importance of multimodality and spatio-temporal reasoning in future research.

Key Takeaways

Earth Observation data offers vast potential for addressing global challenges, but its complexity (diverse modalities, resolutions, processing needs) requires specialized ML techniques.
Foundational Models are emerging as a powerful paradigm in EO, leveraging self-supervised learning and multimodal data to create generalizable representations for various downstream tasks.
The hands-on session demonstrates practical steps for accessing, processing, and visualizing Landsat data using Python, highlighting the importance of cloud-based data access and understanding metadata.
Addressing challenges like data complexity, cost barriers, and ethical considerations (e.g., deepfakes, biases) is crucial for the responsible and effective deployment of AI in EO.
Spatially explicit unsupervised learning techniques, such as graph-based regionalization and tree-based partitioning, are vital for extracting meaningful, contiguous clusters from geospatial data.

Methods / Models / Datasets Mentioned

LandScan
Verter et al., Remote Sensing of Environment
Kalscheuer et al., Nature
LILA BC
xView3-SAR
PlanetScope
Sentinel-2
Landsat-8
Aqua (MODIS)
Whiskbroom Sensor
Pushbroom Sensor
Worldwide Reference System (WRS)
OLI (Operational Land Imager)
TIRS (Thermal Infrared)
ETM+ (Enhanced Thematic Mapper)
MSS (Multispectral scanner)
NDVI (Normalized Difference Vegetation Index)
pystac-client
rasterio
boto3
PyProj
plotNDVI
CNNs
Transformers
Supervised Learning
Self-Supervised Learning (SSL)
Foundational Models (FMs)
Contrastive Learning
Masked Image Modeling (MIM)
SatMAE
RVSA
BFM
GEO-Bench
Scale-MAE
Presto
Prithvi
SatCLIP
USat
SatVision
SkySense
RingMo
DiffusionSat
Croma
Clay
OmniSat
LTFormer
Zeus AI
Google Multi-Source Embeddings
Contrastive Location-Image Pretraining (CLIP)
Masked Image Modeling
NASA HLS Foundation Model - Prithvi
Temporal Vision Transformer
SatMAE
Scale-MAE
Supervised Pretraining (SatlasPretrain)
BigEarthNet-MM
SSL4EO-S1-2
SkySense
GeoPandas
QGIS
OpenStreetMap
OSGeo
GDAL
TorchGeo
satellite-image-deep-learning
IEEE GRSS
ACM SIGSPATIAL
ISPRS
xView2 dataset
Change Detection
Rotated Object Detection (Oriented R-CNN)
Graph-based Regionalization
Tree-based Regionalization (SKATER algorithm)
Fuzzy Regionalization
K-Means
Deepfake Detection
Generative AI (GenAI)
Diffusion Models
ControlNet
GeoAI
Climax
Perceiver
USat
SkySense

Topics

Earth Observation · Remote Sensing · Geospatial Machine Learning · Computer Vision · Foundational Models · Data Processing · Satellite Imagery · Spatial Analysis · Unsupervised Learning · Deep Learning

Notes

Open for commentary — connections to other work, critiques, follow-up reading.