Tutorial — Working with Raster Data
Goal: After this tutorial, you can load, manipulate, reproject, and visualize any satellite imagery in Python using rasterio, numpy, and matplotlib.
This is the mechanical foundation. Every analysis technique in this course — change detection, spectral analysis, object detection, terrain analysis — starts with loading raster data and doing math on arrays. Master this and everything else follows.
Step 1: What Is a Raster?
A raster is a grid of pixels, where each pixel holds one or more numeric values (bands), plus metadata that ties the grid to a location on Earth.
Three components:
- Pixel values — numpy arrays. One 2D array per band. Values represent reflectance, temperature, elevation, backscatter, etc.
- Georeference (affine transform) — maps pixel coordinates (row, col) to geographic coordinates (easting, northing or lon, lat). A 3x3 matrix that encodes origin, pixel size, and rotation.
- Coordinate Reference System (CRS) — defines what the geographic coordinates mean. EPSG:4326 = lat/lon on WGS84 ellipsoid. UTM zones = meters in a projected grid.
Pixel (row=100, col=200)
↓ affine transform
Geographic (X=24.7523, Y=59.4370) in CRS EPSG:4326
↓ CRS interpretation
Physical location: Tallinn, Estonia
Step 2: Load GeoTIFF with Rasterio
import rasterio
import numpy as np
import matplotlib.pyplot as plt
# Open a single-band GeoTIFF (e.g., Sentinel-2 Red band)
with rasterio.open("T35VLF_B04_10m.tif") as src:
# Read band 1 (bands are 1-indexed in rasterio)
red = src.read(1) # numpy array, shape (H, W)
# Metadata
print(f"Shape: {red.shape}")
print(f"Data type: {red.dtype}")
print(f"CRS: {src.crs}")
print(f"Transform:\n{src.transform}")
print(f"Bounds: {src.bounds}")
print(f"Resolution: {src.res}") # (x_res, y_res) in CRS units
print(f"Number of bands: {src.count}")
print(f"NoData value: {src.nodata}")
# The transform (affine matrix):
# | a b c | a = pixel width (in CRS units, e.g., 10m)
# | d e f | e = pixel height (negative, because row 0 = top)
# | 0 0 1 | c, f = origin (upper-left corner coordinates)
# Convert pixel to geographic coordinates
transform = src.transform
row, col = 100, 200
x, y = rasterio.transform.xy(transform, row, col)
print(f"Pixel ({row}, {col}) -> Geographic ({x:.4f}, {y:.4f})")
# Convert geographic to pixel coordinates
row_back, col_back = rasterio.transform.rowcol(transform, x, y)
print(f"Geographic ({x:.4f}, {y:.4f}) -> Pixel ({row_back}, {col_back})")Load Multiple Bands
import rasterio
import numpy as np
def load_sentinel2_bands(band_files, scale_factor=0.0001):
"""
Load Sentinel-2 bands from individual files.
L2A products store reflectance as uint16 with scale factor 0.0001.
Returns: dict of {band_name: numpy_array}, plus metadata.
"""
bands = {}
meta = None
for name, filepath in band_files.items():
with rasterio.open(filepath) as src:
data = src.read(1).astype(np.float32) * scale_factor
bands[name] = data
if meta is None:
meta = {
"transform": src.transform,
"crs": src.crs,
"shape": data.shape,
"profile": src.profile.copy(),
}
return bands, meta
# Example usage with Sentinel-2 L2A product
band_files = {
"blue": "R10m/T35VLF_B02_10m.jp2",
"green": "R10m/T35VLF_B03_10m.jp2",
"red": "R10m/T35VLF_B04_10m.jp2",
"nir": "R10m/T35VLF_B08_10m.jp2",
}
# bands, meta = load_sentinel2_bands(band_files)Step 3: Coordinate Reference Systems (CRS)
CRS is the single most common source of errors in geospatial analysis. If your data is in the wrong CRS, your coordinates are wrong, your distances are wrong, and your overlays don’t align.
The Key Systems
| CRS | EPSG Code | Units | Use |
|---|---|---|---|
| WGS84 Geographic | 4326 | Degrees (lat/lon) | GPS, web maps, general reference |
| UTM Zone 35N | 32635 | Meters | Estonia, Finland (accurate distance/area) |
| Web Mercator | 3857 | Meters (distorted) | Google Maps, web tiles — NEVER use for analysis |
Why CRS matters for analysis:
1 degree of latitude = ~111 km everywhere. 1 degree of longitude = ~111 km at equator, ~56 km at 60N (Estonia).
If you compute distance or area in EPSG:4326, you’ll get wrong results because degrees are not uniform. Always reproject to a metric CRS (UTM) for measurements.
Check and Convert CRS
import rasterio
from rasterio.warp import calculate_default_transform, reproject, Resampling
def reproject_raster(src_path, dst_path, dst_crs="EPSG:32635"):
"""
Reproject a raster to a new CRS.
UTM Zone 35N (EPSG:32635) is correct for Estonia.
"""
with rasterio.open(src_path) as src:
# Calculate new transform and dimensions
transform, width, height = calculate_default_transform(
src.crs, dst_crs, src.width, src.height, *src.bounds
)
# Update profile for output
profile = src.profile.copy()
profile.update(
crs=dst_crs,
transform=transform,
width=width,
height=height,
)
with rasterio.open(dst_path, "w", **profile) as dst:
for band_idx in range(1, src.count + 1):
reproject(
source=rasterio.band(src, band_idx),
destination=rasterio.band(dst, band_idx),
src_transform=src.transform,
src_crs=src.crs,
dst_transform=transform,
dst_crs=dst_crs,
resampling=Resampling.bilinear,
)
print(f"Reprojected {src_path} -> {dst_path} (CRS: {dst_crs})")
# reproject_raster("input_4326.tif", "output_utm35n.tif")Find the Right UTM Zone
def utm_zone_from_lon(lon):
"""Get UTM zone number from longitude."""
zone = int((lon + 180) / 6) + 1
return zone
def utm_epsg(lon, lat):
"""Get EPSG code for the UTM zone containing (lon, lat)."""
zone = utm_zone_from_lon(lon)
if lat >= 0:
return f"EPSG:326{zone:02d}" # North
else:
return f"EPSG:327{zone:02d}" # South
# Estonia
print(utm_epsg(24.75, 59.44)) # EPSG:32635 (UTM Zone 35N)Step 4: Display Imagery with Matplotlib
True Color (RGB) Composite
import numpy as np
import matplotlib.pyplot as plt
import rasterio
def display_rgb(red, green, blue, title="True Color", percentile_stretch=(2, 98)):
"""
Display a true-color composite with histogram stretching.
Input bands should be float arrays.
"""
# Stack into RGB array
rgb = np.stack([red, green, blue], axis=-1)
# Remove invalid pixels
rgb = np.where(rgb < 0, 0, rgb)
# Percentile stretch for visualization
for i in range(3):
band = rgb[:, :, i]
valid = band[band > 0]
if len(valid) == 0:
continue
lo = np.percentile(valid, percentile_stretch[0])
hi = np.percentile(valid, percentile_stretch[1])
rgb[:, :, i] = np.clip((band - lo) / (hi - lo + 1e-10), 0, 1)
fig, ax = plt.subplots(figsize=(10, 10))
ax.imshow(rgb)
ax.set_title(title)
ax.axis("off")
plt.tight_layout()
return fig
# Usage:
# display_rgb(bands["red"], bands["green"], bands["blue"])
# plt.savefig("true_color.png", dpi=150)
# plt.show()False Color Composites
Different band combinations reveal different features:
def display_composite(band_r, band_g, band_b, title="Composite",
percentile_stretch=(2, 98)):
"""Generic band composite display."""
return display_rgb(band_r, band_g, band_b, title, percentile_stretch)
# Common composites:
# True color: Red=B04, Green=B03, Blue=B02
# Color infrared: Red=B08(NIR), Green=B04(Red), Blue=B03(Green)
# -> Vegetation appears RED, water appears BLACK, urban appears CYAN
# Agriculture: Red=B11(SWIR), Green=B08(NIR), Blue=B02(Blue)
# -> Distinguishes crop types and health
# Urban: Red=B12(SWIR2), Green=B11(SWIR1), Blue=B04(Red)
# -> Urban areas appear in distinct colors
# Example:
# display_composite(bands["nir"], bands["red"], bands["green"],
# title="Color Infrared (NIR-Red-Green)")Add Scale Bar and North Arrow
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib.offsetbox import AnchoredText
import numpy as np
def add_scalebar(ax, transform, length_m=1000, location="lower right"):
"""
Add a scale bar to a rasterio-based plot.
transform: rasterio affine transform
length_m: scale bar length in meters
"""
# Pixel size in meters (assumes metric CRS like UTM)
pixel_size = abs(transform.a)
length_px = length_m / pixel_size
# Position in axes coordinates
if location == "lower right":
x_start = 0.65
y_pos = 0.05
elif location == "lower left":
x_start = 0.05
y_pos = 0.05
ax_transform = ax.transAxes
ax.plot(
[x_start, x_start + 0.25],
[y_pos, y_pos],
color="white", linewidth=3,
transform=ax_transform
)
ax.text(
x_start + 0.125, y_pos + 0.02,
f"{length_m/1000:.0f} km" if length_m >= 1000 else f"{length_m} m",
color="white", fontsize=10, ha="center",
transform=ax_transform,
fontweight="bold",
bbox=dict(boxstyle="round", facecolor="black", alpha=0.5),
)
def add_north_arrow(ax, x=0.95, y=0.95, size=0.05):
"""Add a north arrow to the plot."""
ax.annotate(
"N", xy=(x, y), xytext=(x, y - size),
arrowprops=dict(facecolor="white", edgecolor="white", width=2, headwidth=8),
fontsize=12, color="white", ha="center", fontweight="bold",
xycoords="axes fraction", textcoords="axes fraction",
)Histogram of Band Values
def plot_histogram(bands_dict, title="Band Histograms"):
"""Plot histograms of band values."""
fig, ax = plt.subplots(figsize=(10, 5))
colors = {"blue": "blue", "green": "green", "red": "red",
"nir": "darkred", "swir1": "purple", "swir2": "brown"}
for name, data in bands_dict.items():
valid = data[data > 0].flatten()
if len(valid) > 100000:
valid = np.random.choice(valid, 100000) # subsample for speed
ax.hist(valid, bins=200, alpha=0.5, label=name,
color=colors.get(name, None))
ax.set_xlabel("Reflectance")
ax.set_ylabel("Count")
ax.set_title(title)
ax.legend()
plt.tight_layout()
return figStep 5: Band Math — NDVI and Beyond
Band math is element-wise arithmetic on numpy arrays. The most common: vegetation indices.
import numpy as np
def compute_ndvi(nir, red):
"""
Normalized Difference Vegetation Index.
NDVI = (NIR - Red) / (NIR + Red)
Range: -1 to +1. Vegetation > 0.3, water < 0, bare soil 0.1-0.2.
"""
nir = nir.astype(np.float32)
red = red.astype(np.float32)
denominator = nir + red
# Avoid division by zero
ndvi = np.where(denominator > 0, (nir - red) / denominator, 0)
return ndvi
def compute_ndwi(green, nir):
"""
Normalized Difference Water Index (McFeeters).
NDWI = (Green - NIR) / (Green + NIR)
Positive values indicate water.
"""
green = green.astype(np.float32)
nir = nir.astype(np.float32)
denominator = green + nir
ndwi = np.where(denominator > 0, (green - nir) / denominator, 0)
return ndwi
def compute_ndbi(swir, nir):
"""
Normalized Difference Built-up Index.
NDBI = (SWIR - NIR) / (SWIR + NIR)
Positive values indicate built-up/urban areas.
"""
swir = swir.astype(np.float32)
nir = nir.astype(np.float32)
denominator = swir + nir
ndbi = np.where(denominator > 0, (swir - nir) / denominator, 0)
return ndbi
# Usage:
# ndvi = compute_ndvi(bands["nir"], bands["red"])
# plt.imshow(ndvi, cmap="RdYlGn", vmin=-0.2, vmax=0.8)Complete Band Math Example (Synthetic Data)
import numpy as np
import matplotlib.pyplot as plt
# Simulate a Sentinel-2 scene with realistic spectral values
np.random.seed(42)
size = 500
# Create land cover map
land_cover = np.zeros((size, size), dtype=int)
land_cover[:200, :] = 1 # forest (top)
land_cover[200:300, :250] = 2 # water (middle-left)
land_cover[200:300, 250:] = 3 # urban (middle-right)
land_cover[300:, :] = 4 # agriculture (bottom)
# Generate realistic reflectance per land cover type
# Values in surface reflectance units (0-1)
spectral_profiles = {
# (blue, green, red, nir, swir1)
0: (0.08, 0.10, 0.14, 0.22, 0.28), # bare soil
1: (0.02, 0.05, 0.03, 0.45, 0.15), # forest
2: (0.05, 0.04, 0.03, 0.01, 0.00), # water
3: (0.08, 0.09, 0.12, 0.18, 0.25), # urban
4: (0.03, 0.07, 0.04, 0.50, 0.20), # agriculture (healthy crop)
}
bands = {}
band_names = ["blue", "green", "red", "nir", "swir1"]
for i, name in enumerate(band_names):
band = np.zeros((size, size), dtype=np.float32)
for lc, profile in spectral_profiles.items():
mask = land_cover == lc
band[mask] = profile[i] + np.random.normal(0, 0.01, np.sum(mask))
bands[name] = np.clip(band, 0, 1)
# Compute indices
ndvi = compute_ndvi(bands["nir"], bands["red"])
ndwi = compute_ndwi(bands["green"], bands["nir"])
ndbi = compute_ndbi(bands["swir1"], bands["nir"])
# Display
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
# True color
rgb = np.stack([bands["red"], bands["green"], bands["blue"]], axis=-1)
rgb = np.clip(rgb * 5, 0, 1) # stretch for display
axes[0, 0].imshow(rgb)
axes[0, 0].set_title("True Color")
# Color infrared
cir = np.stack([bands["nir"], bands["red"], bands["green"]], axis=-1)
cir = np.clip(cir * 3, 0, 1)
axes[0, 1].imshow(cir)
axes[0, 1].set_title("Color Infrared (NIR-R-G)")
# NDVI
im = axes[0, 2].imshow(ndvi, cmap="RdYlGn", vmin=-0.5, vmax=0.8)
axes[0, 2].set_title("NDVI")
plt.colorbar(im, ax=axes[0, 2], fraction=0.046)
# NDWI
im = axes[1, 0].imshow(ndwi, cmap="RdBu", vmin=-0.5, vmax=0.5)
axes[1, 0].set_title("NDWI (water positive)")
plt.colorbar(im, ax=axes[1, 0], fraction=0.046)
# NDBI
im = axes[1, 1].imshow(ndbi, cmap="RdYlBu_r", vmin=-0.5, vmax=0.5)
axes[1, 1].set_title("NDBI (urban positive)")
plt.colorbar(im, ax=axes[1, 1], fraction=0.046)
# Simple classification
classification = np.zeros_like(ndvi, dtype=int)
classification[ndwi > 0.1] = 1 # water
classification[ndvi > 0.3] = 2 # vegetation
classification[(ndbi > 0) & (ndvi < 0.3)] = 3 # urban
# 0 = unclassified/bare soil
colors = ["tan", "blue", "green", "gray"]
from matplotlib.colors import ListedColormap
cmap = ListedColormap(colors)
axes[1, 2].imshow(classification, cmap=cmap, vmin=0, vmax=3)
axes[1, 2].set_title("Classification")
import matplotlib.patches as mpatches
legend = [mpatches.Patch(color=c, label=l)
for c, l in zip(colors, ["Bare soil", "Water", "Vegetation", "Urban"])]
axes[1, 2].legend(handles=legend, loc="lower right")
for ax in axes.flat:
ax.axis("off")
plt.tight_layout()
plt.savefig("band_math_demo.png", dpi=150)
plt.show()Step 6: Clip to Area of Interest
Real analysis is never on a full 100x100 km tile. You clip to your area of interest (AOI).
import rasterio
from rasterio.mask import mask
import geopandas as gpd
from shapely.geometry import box, mapping
import json
def clip_raster_to_bbox(raster_path, output_path, bbox, dst_crs=None):
"""
Clip a raster to a bounding box.
bbox: (west, south, east, north) in the raster's CRS, or specify dst_crs.
"""
west, south, east, north = bbox
clip_geom = box(west, south, east, north)
with rasterio.open(raster_path) as src:
# If bbox is in different CRS, reproject geometry
if dst_crs and str(src.crs) != dst_crs:
from pyproj import Transformer
transformer = Transformer.from_crs(dst_crs, str(src.crs), always_xy=True)
west_t, south_t = transformer.transform(west, south)
east_t, north_t = transformer.transform(east, north)
clip_geom = box(west_t, south_t, east_t, north_t)
# Clip
out_image, out_transform = mask(src, [mapping(clip_geom)], crop=True)
out_profile = src.profile.copy()
out_profile.update(
height=out_image.shape[1],
width=out_image.shape[2],
transform=out_transform,
)
with rasterio.open(output_path, "w", **out_profile) as dst:
dst.write(out_image)
print(f"Clipped to {output_path}: {out_image.shape}")
# Clip to Tallinn old town area (approximate bbox in UTM 35N)
# clip_raster_to_bbox(
# "sentinel2_full_tile.tif",
# "tallinn_oldtown.tif",
# bbox=(540000, 6586000, 545000, 6590000), # UTM 35N meters
# )Clip with Shapefile/GeoJSON
import rasterio
from rasterio.mask import mask
import geopandas as gpd
def clip_raster_to_shapefile(raster_path, output_path, shapefile_path):
"""Clip a raster using a vector boundary (shapefile or GeoJSON)."""
# Load vector
gdf = gpd.read_file(shapefile_path)
with rasterio.open(raster_path) as src:
# Reproject vector to raster CRS if needed
if gdf.crs != src.crs:
gdf = gdf.to_crs(src.crs)
# Get geometries as GeoJSON-like dicts
geoms = [mapping(geom) for geom in gdf.geometry]
out_image, out_transform = mask(src, geoms, crop=True, nodata=0)
out_profile = src.profile.copy()
out_profile.update(
height=out_image.shape[1],
width=out_image.shape[2],
transform=out_transform,
nodata=0,
)
with rasterio.open(output_path, "w", **out_profile) as dst:
dst.write(out_image)Step 7: Save Results as GeoTIFF
Always preserve georeference when saving derived products.
import rasterio
import numpy as np
def save_geotiff(data, output_path, reference_raster_path=None,
transform=None, crs=None, nodata=None):
"""
Save a numpy array as a GeoTIFF, preserving georeference.
Either provide reference_raster_path OR (transform + crs).
data: 2D array (single band) or 3D array (bands, height, width)
"""
if data.ndim == 2:
data = data[np.newaxis, :, :] # add band dimension
bands, height, width = data.shape
if reference_raster_path:
with rasterio.open(reference_raster_path) as ref:
transform = ref.transform
crs = ref.crs
profile = {
"driver": "GTiff",
"dtype": data.dtype,
"width": width,
"height": height,
"count": bands,
"crs": crs,
"transform": transform,
"compress": "deflate",
}
if nodata is not None:
profile["nodata"] = nodata
with rasterio.open(output_path, "w", **profile) as dst:
dst.write(data)
print(f"Saved: {output_path} ({bands} bands, {height}x{width}, {data.dtype})")
# Save NDVI result
# save_geotiff(ndvi, "ndvi_output.tif",
# reference_raster_path="sentinel2_B04.tif")Step 8: Working with Sentinel-2 — Complete Pipeline
import rasterio
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from rasterio.enums import Resampling
def load_sentinel2_l2a(product_dir):
"""
Load Sentinel-2 L2A product from unzipped SAFE directory.
Returns dict of bands at 10m resolution (20m bands upsampled).
"""
product_dir = Path(product_dir)
# Find the GRANULE directory
granule_dirs = list((product_dir / "GRANULE").iterdir())
if not granule_dirs:
raise FileNotFoundError(f"No granule found in {product_dir}")
granule = granule_dirs[0]
img_dir = granule / "IMG_DATA"
r10m = img_dir / "R10m"
r20m = img_dir / "R20m"
bands = {}
ref_shape = None
ref_transform = None
ref_crs = None
# 10m bands
for band_name, file_suffix in [("B02", "B02_10m"), ("B03", "B03_10m"),
("B04", "B04_10m"), ("B08", "B08_10m")]:
matches = list(r10m.glob(f"*{file_suffix}*"))
if matches:
with rasterio.open(matches[0]) as src:
data = src.read(1).astype(np.float32) * 0.0001
bands[band_name] = data
if ref_shape is None:
ref_shape = data.shape
ref_transform = src.transform
ref_crs = src.crs
# 20m bands — resample to 10m
for band_name, file_suffix in [("B05", "B05_20m"), ("B06", "B06_20m"),
("B07", "B07_20m"), ("B8A", "B8A_20m"),
("B11", "B11_20m"), ("B12", "B12_20m"),
("SCL", "SCL_20m")]:
matches = list(r20m.glob(f"*{file_suffix}*"))
if matches:
with rasterio.open(matches[0]) as src:
# Resample from 20m to 10m
data = src.read(
1,
out_shape=ref_shape,
resampling=Resampling.bilinear if band_name != "SCL"
else Resampling.nearest,
).astype(np.float32)
if band_name != "SCL":
data *= 0.0001
bands[band_name] = data
meta = {"transform": ref_transform, "crs": ref_crs, "shape": ref_shape}
return bands, meta
def cloud_mask_scl(scl):
"""
Create cloud mask from Sentinel-2 Scene Classification Layer.
SCL classes:
0: No Data, 1: Saturated/Defective, 2: Dark area,
3: Cloud shadow, 4: Vegetation, 5: Bare soils,
6: Water, 7: Unclassified, 8: Cloud (medium prob),
9: Cloud (high prob), 10: Thin cirrus, 11: Snow/ice
Returns: boolean mask (True = VALID pixel, False = cloudy/invalid)
"""
valid_classes = {4, 5, 6, 7, 11} # vegetation, soil, water, unclassified, snow
valid = np.isin(scl.astype(int), list(valid_classes))
return valid
# Full pipeline example:
# bands, meta = load_sentinel2_l2a("S2A_MSIL2A_20240315T100021_N0510_R122_T35VLF.SAFE")
#
# # Cloud mask
# if "SCL" in bands:
# valid = cloud_mask_scl(bands["SCL"])
# for name in bands:
# if name != "SCL":
# bands[name] = np.where(valid, bands[name], np.nan)
#
# # Compute NDVI
# ndvi = (bands["B08"] - bands["B04"]) / (bands["B08"] + bands["B04"] + 1e-10)
#
# # Display
# fig, axes = plt.subplots(1, 3, figsize=(18, 6))
# rgb = np.stack([bands["B04"], bands["B03"], bands["B02"]], axis=-1)
# rgb = np.clip(rgb * 3.5, 0, 1)
# axes[0].imshow(rgb)
# axes[0].set_title("True Color")
# axes[1].imshow(ndvi, cmap="RdYlGn", vmin=-0.2, vmax=0.8)
# axes[1].set_title("NDVI")
# axes[2].imshow(valid, cmap="RdYlGn")
# axes[2].set_title("Cloud Mask (green=valid)")
# plt.show()Try This Next
Exercise 1: Create NDWI Water Index Map
- Load Sentinel-2 B03 (Green) and B08 (NIR) for a coastal area
- Compute NDWI = (Green - NIR) / (Green + NIR)
- Threshold at 0.1 to create binary water mask
- Compute total water area in km^2 (multiply pixel count by pixel area)
- Save result as GeoTIFF
Exercise 2: Stack Bands from Multiple Files
- Load all 10m Sentinel-2 bands (B02, B03, B04, B08) from separate JP2 files
- Stack into a single multi-band GeoTIFF
- Verify the output has 4 bands and correct georeference
Exercise 3: Overlay Vector on Raster
- Load a Sentinel-2 true color composite
- Download Estonia administrative boundaries (e.g., from gadm.org)
- Overlay county boundaries on the satellite image
- Use geopandas + matplotlib for the overlay
Self-Test Questions
- What does the affine transform encode? What happens if it’s wrong?
- Why should you NEVER use EPSG:3857 (Web Mercator) for area calculations?
- What is the Sentinel-2 SCL band and why is it important?
- How do you resample a 20m band to 10m? When would you use bilinear vs nearest-neighbor?
- What scale factor converts Sentinel-2 L2A uint16 values to surface reflectance?
See also: Tutorial - Acquiring Free Satellite Imagery | Multispectral Analysis | Change Detection Next: Multispectral Analysis