Tutorial — Object Detection in Satellite Imagery
Goal: After this tutorial, you can detect and count objects (vehicles, buildings, ships) in satellite and aerial imagery using both classical methods and deep learning.
Object detection in satellite imagery is fundamentally different from standard computer vision. Objects are tiny (a car = 4x2 pixels at 1m GSD), images are enormous (10000x10000+), classes are few but backgrounds are complex, and the viewing angle is always nadir (straight down) instead of the perspective views CNNs are trained on.
Connection to AI/ML CV: This builds directly on your Object Detection and YOLO notes. The neural network architecture is the same — the challenge is the data pipeline and the scale.
Step 1: The Scale Challenge
A single Sentinel-2 tile is 10980 x 10980 pixels at 10m resolution, covering 109.8 km x 109.8 km. A high-res commercial image might be 40000 x 40000 at 50cm. You cannot feed this to a neural network. You must tile it.
import numpy as np
def tile_image(img, tile_size=512, overlap=64):
"""
Split a large image into overlapping tiles for inference.
img: numpy array (H, W) or (H, W, C)
Returns: list of (tile, row_offset, col_offset) tuples
"""
if img.ndim == 2:
h, w = img.shape
else:
h, w = img.shape[:2]
stride = tile_size - overlap
tiles = []
for row in range(0, h - overlap, stride):
for col in range(0, w - overlap, stride):
# Handle edge tiles
row_end = min(row + tile_size, h)
col_end = min(col + tile_size, w)
row_start = max(0, row_end - tile_size)
col_start = max(0, col_end - tile_size)
if img.ndim == 2:
tile = img[row_start:row_end, col_start:col_end]
else:
tile = img[row_start:row_end, col_start:col_end]
tiles.append((tile, row_start, col_start))
return tiles
def stitch_detections(all_detections, iou_threshold=0.5):
"""
Merge detections from overlapping tiles.
all_detections: list of (bbox, score, class_id) where bbox = [x1, y1, x2, y2]
in FULL IMAGE coordinates.
Returns: filtered detections after NMS across tile boundaries.
"""
if not all_detections:
return []
boxes = np.array([d[0] for d in all_detections])
scores = np.array([d[1] for d in all_detections])
classes = np.array([d[2] for d in all_detections])
# Non-Maximum Suppression per class
keep = []
for cls in np.unique(classes):
cls_mask = classes == cls
cls_boxes = boxes[cls_mask]
cls_scores = scores[cls_mask]
cls_indices = np.where(cls_mask)[0]
# Sort by score
order = cls_scores.argsort()[::-1]
while len(order) > 0:
i = order[0]
keep.append(cls_indices[i])
if len(order) == 1:
break
# Compute IoU with remaining
xx1 = np.maximum(cls_boxes[i, 0], cls_boxes[order[1:], 0])
yy1 = np.maximum(cls_boxes[i, 1], cls_boxes[order[1:], 1])
xx2 = np.minimum(cls_boxes[i, 2], cls_boxes[order[1:], 2])
yy2 = np.minimum(cls_boxes[i, 3], cls_boxes[order[1:], 3])
inter = np.maximum(0, xx2 - xx1) * np.maximum(0, yy2 - yy1)
area_i = ((cls_boxes[i, 2] - cls_boxes[i, 0]) *
(cls_boxes[i, 3] - cls_boxes[i, 1]))
area_j = ((cls_boxes[order[1:], 2] - cls_boxes[order[1:], 0]) *
(cls_boxes[order[1:], 3] - cls_boxes[order[1:], 1]))
iou = inter / (area_i + area_j - inter + 1e-10)
remaining = np.where(iou < iou_threshold)[0]
order = order[remaining + 1]
return [all_detections[i] for i in keep]Step 2: The Tiling Pipeline
import numpy as np
def run_detection_on_large_image(img, detect_fn, tile_size=512, overlap=128,
score_threshold=0.3):
"""
Full pipeline: tile → detect → stitch.
detect_fn: function that takes a tile (H, W, C) and returns
list of (bbox_local, score, class_id)
"""
tiles = tile_image(img, tile_size, overlap)
all_detections = []
for tile, row_offset, col_offset in tiles:
# Run detector on tile
tile_detections = detect_fn(tile)
# Convert local tile coordinates to full image coordinates
for bbox, score, class_id in tile_detections:
if score < score_threshold:
continue
global_bbox = [
bbox[0] + col_offset,
bbox[1] + row_offset,
bbox[2] + col_offset,
bbox[3] + row_offset,
]
all_detections.append((global_bbox, score, class_id))
# NMS across tile boundaries
final = stitch_detections(all_detections, iou_threshold=0.5)
return finalStep 3: Using YOLO on Aerial/Satellite Data
YOLO (You Only Look Once) is the most practical detector for satellite imagery. Fast, single-pass, good at small objects with the right training data.
Option A: Use Pretrained YOLO (General Objects)
# pip install ultralytics
from ultralytics import YOLO
import numpy as np
def yolo_detect_tile(tile, model, conf=0.25):
"""
Run YOLOv8 on a single tile.
Returns: list of (bbox, score, class_id)
"""
results = model(tile, conf=conf, verbose=False)
detections = []
for result in results:
for box in result.boxes:
x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
score = float(box.conf[0])
class_id = int(box.cls[0])
detections.append(([x1, y1, x2, y2], score, class_id))
return detections
# Load pretrained YOLOv8
# model = YOLO("yolov8n.pt") # nano — fastest
# model = YOLO("yolov8s.pt") # small — better accuracy
# For satellite imagery, you likely need a model trained on aerial data:
# - DOTAv2 dataset (oriented aerial object detection)
# - xView dataset (60 classes of overhead objects)
# - DIOR dataset (20 classes)
# Community models on Hugging Face: search "yolo satellite" or "yolo aerial"Option B: Train YOLO on Aerial Data
# Dataset structure for YOLO training:
# dataset/
# train/
# images/
# tile_0001.jpg
# labels/
# tile_0001.txt # YOLO format: class_id cx cy w h (normalized)
# val/
# images/
# labels/
#
# data.yaml:
# train: dataset/train
# val: dataset/val
# nc: 5 # number of classes
# names: ['vehicle', 'aircraft', 'ship', 'building', 'storage_tank']
# Training
# from ultralytics import YOLO
# model = YOLO("yolov8s.pt")
# model.train(data="data.yaml", epochs=100, imgsz=512, batch=16)Public Aerial Object Detection Datasets (Free)
| Dataset | Classes | Images | Resolution | Source |
|---|---|---|---|---|
| xView | 60 | 1M+ objects | ~30cm | DIUx (satellite) |
| DOTA | 18 | 2800+ | varies | aerial/satellite |
| DIOR | 20 | 23k+ | 0.5-30m | Google Earth |
| FAIR1M | 37 | 15k+ | 0.3-0.8m | satellite |
| VEDAI | 9 (vehicles) | 1200 | 12.5cm | aerial |
Step 4: Detect and Stitch — Full Example
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
def demo_detection_pipeline():
"""
Full detection pipeline demo with synthetic data.
Simulates an airfield with aircraft and vehicles.
"""
np.random.seed(42)
# Create synthetic airfield image (1000x1000 at ~1m resolution)
size = 1000
img = np.random.normal(0.3, 0.02, (size, size, 3)) # background
# Runway
img[480:520, 100:900] = [0.5, 0.5, 0.5] # gray
# Taxiway
img[430:440, 300:700] = [0.45, 0.45, 0.45]
# Buildings (hangars)
for bx in [150, 250, 350]:
img[350:400, bx:bx+60] = [0.55, 0.50, 0.45]
# Aircraft on apron (small bright objects)
aircraft_positions = []
for ax, ay in [(200, 410), (300, 415), (400, 408), (250, 440), (350, 435)]:
# Aircraft: ~30m long, ~25m wingspan at 1m = 30x25 pixels
img[ay:ay+12, ax:ax+20] = [0.65, 0.65, 0.60]
aircraft_positions.append((ax, ay, ax+20, ay+12))
# Vehicles in parking area
vehicle_positions = []
for vx in range(600, 750, 15):
for vy in range(350, 420, 12):
if np.random.random() > 0.3:
img[vy:vy+4, vx:vx+8] = [0.4, 0.38, 0.35]
vehicle_positions.append((vx, vy, vx+8, vy+4))
img = np.clip(img, 0, 1)
# Simulate detection (replace with real YOLO in practice)
all_detections = []
for bbox in aircraft_positions:
score = np.random.uniform(0.7, 0.95)
all_detections.append((list(bbox), score, 0)) # class 0 = aircraft
for bbox in vehicle_positions:
score = np.random.uniform(0.4, 0.85)
all_detections.append((list(bbox), score, 1)) # class 1 = vehicle
# Visualization
fig, ax = plt.subplots(figsize=(12, 12))
ax.imshow(img)
colors = {0: "red", 1: "cyan"}
labels = {0: "Aircraft", 1: "Vehicle"}
for bbox, score, cls in all_detections:
x1, y1, x2, y2 = bbox
rect = plt.Rectangle((x1, y1), x2-x1, y2-y1,
linewidth=1.5, edgecolor=colors[cls],
facecolor="none")
ax.add_patch(rect)
ax.text(x1, y1-2, f"{score:.2f}", color=colors[cls], fontsize=6)
# Legend
patches = [mpatches.Patch(color=c, label=f"{labels[cls]} ({sum(1 for d in all_detections if d[2]==cls)})")
for cls, c in colors.items()]
ax.legend(handles=patches, loc="upper right", fontsize=12)
n_aircraft = sum(1 for d in all_detections if d[2] == 0)
n_vehicles = sum(1 for d in all_detections if d[2] == 1)
ax.set_title(f"Airfield Object Detection — {n_aircraft} aircraft, {n_vehicles} vehicles")
ax.axis("off")
plt.tight_layout()
plt.savefig("airfield_detection.png", dpi=150)
plt.show()
return all_detections
detections = demo_detection_pipeline()Step 5: Ship Detection in SAR (Classical Method)
For SAR ship detection, you don’t need a neural network. Ships are bright point targets on dark water — a simple threshold-based CFAR detector works well. See Ship Detection in SAR for the full implementation.
def simple_ship_detection_sar(sar_vv, pixel_size_m=10):
"""
Quick ship detection for Sentinel-1.
1. Identify water (dark areas)
2. Find bright pixels on water
3. Cluster into ship objects
"""
from scipy.ndimage import label, binary_dilation
sar_db = 10 * np.log10(np.maximum(sar_vv, 1e-10))
# Water mask: percentile-based (water is typically <-18 dB)
water = sar_db < np.percentile(sar_db, 30)
# Ship candidates: bright pixels on water
# Adaptive threshold: mean + 3*std of water pixels
water_mean = np.mean(sar_db[water])
water_std = np.std(sar_db[water])
ship_threshold = water_mean + 5 * water_std
candidates = (sar_db > ship_threshold) & binary_dilation(water, iterations=3)
# Cluster
labeled, n = label(candidates)
ships = []
for i in range(1, n + 1):
pixels = np.argwhere(labeled == i)
area = len(pixels)
if area < 2: # minimum size
continue
length_m = (pixels[:, 1].max() - pixels[:, 1].min() + 1) * pixel_size_m
width_m = (pixels[:, 0].max() - pixels[:, 0].min() + 1) * pixel_size_m
centroid = pixels.mean(axis=0)
ships.append({
"centroid": centroid,
"length_m": max(length_m, width_m),
"width_m": min(length_m, width_m),
"area_pixels": area,
})
return shipsStep 6: Building Footprint Extraction
For high-resolution imagery, extract building footprints using segmentation.
import numpy as np
from scipy.ndimage import label, binary_fill_holes, binary_opening
def extract_buildings_simple(nir, red, ndbi_threshold=0.0, min_area_pixels=20):
"""
Simple building extraction using spectral indices.
Works best with high-res data (<5m).
For Sentinel-2, this extracts urban AREAS, not individual buildings.
"""
# NDVI (vegetation has high values)
ndvi = (nir - red) / (nir + red + 1e-10)
# Non-vegetation, non-water pixels are candidate buildings
# In practice, you'd also use SWIR for NDBI
building_candidates = (ndvi < 0.15) & (nir > 0.05)
# Morphological cleanup
clean = binary_opening(building_candidates, iterations=1)
clean = binary_fill_holes(clean)
# Label individual buildings
labeled, n_buildings = label(clean)
buildings = []
for i in range(1, n_buildings + 1):
mask = labeled == i
area = np.sum(mask)
if area < min_area_pixels:
continue
centroid = np.mean(np.argwhere(mask), axis=0)
buildings.append({
"id": i,
"centroid": centroid,
"area_pixels": area,
})
return buildings, labeledStep 7: Vehicle Counting
import numpy as np
import matplotlib.pyplot as plt
def count_vehicles_in_area(img, roi_bbox, min_brightness=0.4, max_size=200,
min_size=8):
"""
Count bright objects (potential vehicles) in a region of interest.
Assumes high-res imagery (<1m).
roi_bbox: (row_start, row_end, col_start, col_end)
"""
from scipy.ndimage import label
r1, r2, c1, c2 = roi_bbox
roi = img[r1:r2, c1:c2]
if roi.ndim == 3:
roi_gray = np.mean(roi, axis=2)
else:
roi_gray = roi
# Detect bright objects (vehicles on parking lot)
bright = roi_gray > min_brightness
labeled, n = label(bright)
vehicles = 0
for i in range(1, n + 1):
size = np.sum(labeled == i)
if min_size <= size <= max_size:
vehicles += 1
return vehicles
# Track vehicle counts over time for change analysis:
# dates = ["2024-01-15", "2024-02-15", "2024-03-15", ...]
# counts = [count_vehicles_in_area(img, parking_lot_bbox) for img in images]
# A sudden increase in vehicle count at a military facility = increased readinessEvaluation Metrics
When you deploy a detector, measure performance:
def compute_detection_metrics(predictions, ground_truth, iou_threshold=0.5):
"""
Compute precision, recall, F1 for object detection.
predictions: list of [x1, y1, x2, y2, score]
ground_truth: list of [x1, y1, x2, y2]
"""
if not predictions or not ground_truth:
return {"precision": 0, "recall": 0, "f1": 0}
preds = sorted(predictions, key=lambda x: -x[4]) # sort by score
gt_matched = [False] * len(ground_truth)
tp, fp = 0, 0
for pred in preds:
best_iou = 0
best_gt = -1
for j, gt in enumerate(ground_truth):
if gt_matched[j]:
continue
iou = compute_iou(pred[:4], gt)
if iou > best_iou:
best_iou = iou
best_gt = j
if best_iou >= iou_threshold:
tp += 1
gt_matched[best_gt] = True
else:
fp += 1
fn = sum(1 for m in gt_matched if not m)
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
return {"precision": precision, "recall": recall, "f1": f1, "tp": tp, "fp": fp, "fn": fn}
def compute_iou(box1, box2):
"""IoU between two boxes [x1, y1, x2, y2]."""
x1 = max(box1[0], box2[0])
y1 = max(box1[1], box2[1])
x2 = min(box1[2], box2[2])
y2 = min(box1[3], box2[3])
inter = max(0, x2 - x1) * max(0, y2 - y1)
area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
return inter / (area1 + area2 - inter + 1e-10)Common False Positives
| False Positive | Why It Happens | Mitigation |
|---|---|---|
| Bright roof corners | Similar size/brightness to vehicles | Use spectral info (vehicles have different NIR) |
| Shipping containers | Rectangular, vehicle-sized | Contextual: containers are in rows at ports |
| Shadows of buildings | Dark rectangles at consistent angle | Shadow direction analysis |
| Boats vs vehicles | Similar size in overhead view | Context: water vs land |
| Cloud shadows | Dark patches on ground | Cloud mask from SCL band |
Try This Next
Exercise 1: Fine-Tune YOLO on Aerial Data
- Download DIOR or VEDAI dataset (small, manageable)
- Convert to YOLO format
- Fine-tune YOLOv8-nano for 50 epochs
- Evaluate on test set — what precision/recall do you get?
Exercise 2: Ship Detection Pipeline
- Download a Sentinel-1 GRD scene of a busy shipping lane
- Apply speckle filtering (Lee filter from SAR Fundamentals and Analysis)
- Run CFAR ship detection
- Count ships and estimate their sizes
- Compare with optical imagery of the same area/time
Exercise 3: Vehicle Count Time Series
- Using Google Earth or Planet Explorer, find a parking lot visible in satellite imagery
- Count vehicles manually on 3-5 dates
- What patterns do you observe? (weekday vs weekend, time of year)
- This manual process is what automated detection replaces at scale
Self-Test Questions
- Why do you need overlapping tiles instead of non-overlapping?
- A YOLO model trained on ImageNet car photos fails on satellite imagery. Why?
- What is the minimum GSD needed to detect a standard car (4.5m x 1.8m)?
- Why is CFAR better than a fixed threshold for SAR ship detection?
- You get 95% precision but 40% recall. What does this mean operationally?
See also: SAR Fundamentals and Analysis | Change Detection | Multispectral Analysis Next: Terrain Analysis and Geolocation