Agriculture Climate Environment Infrastructure Security Water

Democratising EO Intelligence: CORSA and Major Tom Now Live on Terrascope

Every day, petabytes of Earth observation (EO) data stream into global archives. With Sentinel satellites capturing the planet in extraordinary detail, we are living in an age where data is abundant—but storing, sharing, and leveraging this data is a growing challenge. This is especially true for developers and researchers who want to build Artificial Intelligence (AI) solutions on top of EO imagery but are constrained by bandwidth, storage, or limited labelled data.

That’s why at VITO Remote Sensing, we developed CORSA—a lightweight AI-based compression model that does much more than shrink file sizes. It acts as both a high-efficiency storage solution and a ready-to-use encoder that unlocks fast AI prototyping, even for those with limited resources or training data.

In this blog, Remote Sensing expert Bart Beusen showcases how CORSA is made accessible through the Terrascope platform, using a curated version of the Major Tom dataset. Together, they form a powerful stack that brings down barriers to entry in AI for EO, democratising geospatial intelligence with less cost, less latency, and less energy consumption.

Blog Bart Beusen 23 June 2025

What is Major Tom?

Major Tom is an open-access dataset developed by ESA's Φ-lab. It divides the globe into a 10 km by 10 km grid, assigning a high-resolution Sentinel-2 patch to each cell. Each patch is a multispectral cube with 12 bands, each stored as a separate GeoTIFF file (B01–B12). This structured approach allows standardised benchmarking and fair comparisons of AI models across geographies and tasks.

For this demonstration, we focused on a regional subset covering Flanders and parts of the Netherlands. We made these patches available in both their original and CORSA-compressed forms.

Enter CORSA: Compression Meets Intelligence

CORSA isn't just another image compression tool. It’s built on a Vector Quantized Variational Auto-Encoder (VQVAE) architecture, trained to compress multispectral satellite images while preserving their semantic content. Instead of storing the original image, CORSA represents it through indices pointing to a learned codebook of feature vectors—drastically reducing storage size while maintaining rich visual information.

Figure 1: Reconstructing original image from compressed CORSA embedding.

This dual role of CORSA—compression and feature extraction—means developers can train downstream models (like land cover classifiers or change detectors) directly on the compressed features, bypassing the need to decode or reprocess the full original image.

CORSA + Terrascope = Ready-to-Use AI Stack

Terrascope is a the Belgian open EO platform—funded by the Belgian Science Policy—offering on-demand access to a wide range of geospatial datasets and processing capabilities. By integrating CORSA outputs as a public data collection, Terrascope now enables anyone to build EO applications using precomputed embeddings—no GPU required, no downloads of multi-gigabyte files.

On Terrascope, we published:

Original Sentinel-2 Major Tom patches over Flanders and the Netherlands.
CORSA-compressed versions (feature maps) of the same patches.
A sample Jupyter Notebook that shows how to load, visualise, and use these embeddings in a downstream land use classification task.

Speed and Efficiency: A Quick Comparison

Let’s take a group of 27 grid cells located between Antwerp and Rotterdam (see Figure 2) and compare performance:

Figure 2: Map showing Major Tom grid patches over Flanders and the Netherlands.

Format	File Size	Download Time	Reconstruct Time per Tile
Original S2 (12 bands)	382.8 MB	60.6 s	0.2 s
CORSA (2 feature levels)	10.1 MB	5.9 s	1.8 s (decode) + 0.5 s (scaling)

Figure 3: Bar chart comparing download time and file size for original vs CORSA format.

That's:

10× faster downloads
~32× smaller files
And more importantly: ready-to-use feature vectors without needing to retrain a model.

From Colour to Classification

To explore the semantic structure of CORSA embeddings, we visualised them in two ways:

Codebook-inherent colour: Based on the 3D arrangement of vectors during training.
t-SNE-based colourisation: A non-linear projection of codebook vectors into 3D space, normalised and mapped to RGB.

Figure 4: Side-by-side visualisation of the “571U_29R” patch using codebook colour and t-SNE colourisation.

These visualisations give a striking view of the ‘semantic texture’ of the Earth, as learned by CORSA.

As a toy example, we trained a lightweight land cover classification model using only 541 samples from the Dynamic World dataset, leveraging CORSA embeddings directly as input. This drastically reduces the need for annotated data and training time—perfect for rapid prototyping or deployment in low-resource settings.

Figure 5: Land cover classification map for grid cell 571U_29R using CORSA features.

Why CORSA is Unique: One Solution, Many Wins

CORSA stands apart from traditional compression or AI feature extractors because it solves multiple challenges at once:

Storage-efficient: Achieves 25–40× compression on Sentinel-2 imagery
Bandwidth-friendly: Smaller files = faster downloads
Energy-saving: Reduces server-side and client-side compute
Model-ready: Feature embeddings usable out of the box
Few-shot friendly: Enables training with fewer labels
Sensor-adaptable: Can be retrained for other satellites or sensors in a self-supervised way

For developers and researchers, this makes CORSA a Swiss-army knife for EO workflows—from data handling to AI deployment.

Toward an Inclusive Future for AI4EO

What we’re seeing is a transition in the EO world: from data hoarding to data accessibility, from big compute to smart compute. By combining CORSA’s intelligence-preserving compression with the cloud-native accessibility of Terrascope, we make it easier for more people—researchers, NGOs, startups, and students—to work with remote sensing data and build impactful AI solutions.

This is a step toward the democratisation of AI4EO—bringing down barriers like cost, compute, and data availability to unlock innovation for all.

Figure 6: Walkthrough of the Terrascope notebook.

Join Us at Living Planet Symposium

Curious to learn more about data compression and how it can support your work in Earth observation (EO)? Visit the VITO booth (U31) at the Living Planet Symposium 2025 in Vienna during 23-27 June. Our Remote Sensing experts are looking forward to answering your questions and showing how CORSA can support data accessibility. And don't miss our presentations, demo, and poster on Thursday 26 June and Friday 27 June to learn more about the latest CORSA updates:

Timing	Type / Session	Topic	Speaker	Location
Thursday, 26 June 14:00-15:30	Oral Presentation D.02.06	From Edge to Insights: Transforming Earth Observation with Lightweight Foundation Models and Embeddings-as-a-Service	Tanja Van Achteren	Hall G1
Thursday, 26 June 15:45-16:15	Demo at VITO Booth	From Orbit to Insights: CORSA Live on Edge, Insights via Terrascope Compressed Embeddings. In Collaboration with Unibap.	Tanja Van Achteren	VITO Booth (U31), EO Arena
Thursday, 26 June 17:45-19:00	Poster D.04.03	Unlocking ML and Foundation Models within openEO	Hans Vanrompay	X5 - Poster Area
Friday, 27 June 14:30-16:00	Oral Presentation C.01.03	Efficient On-Board Processing Using a Shared AI Backbone Acorss Multiple Tasks	Bart Beusen, Andreas Luyts	Room 1.85/1.86