Tutorial — Object Detection in Satellite Imagery

Goal: After this tutorial, you can detect and count objects (vehicles, buildings, ships) in satellite and aerial imagery using both classical methods and deep learning.

Object detection in satellite imagery is fundamentally different from standard computer vision. Objects are tiny (a car = 4x2 pixels at 1m GSD), images are enormous (10000x10000+), classes are few but backgrounds are complex, and the viewing angle is always nadir (straight down) instead of the perspective views CNNs are trained on.

Connection to AI/ML CV: This builds directly on your Object Detection and YOLO notes. The neural network architecture is the same — the challenge is the data pipeline and the scale.


Step 1: The Scale Challenge

A single Sentinel-2 tile is 10980 x 10980 pixels at 10m resolution, covering 109.8 km x 109.8 km. A high-res commercial image might be 40000 x 40000 at 50cm. You cannot feed this to a neural network. You must tile it.

import numpy as np
 
def tile_image(img, tile_size=512, overlap=64):
    """
    Split a large image into overlapping tiles for inference.
    img: numpy array (H, W) or (H, W, C)
    Returns: list of (tile, row_offset, col_offset) tuples
    """
    if img.ndim == 2:
        h, w = img.shape
    else:
        h, w = img.shape[:2]
 
    stride = tile_size - overlap
    tiles = []
 
    for row in range(0, h - overlap, stride):
        for col in range(0, w - overlap, stride):
            # Handle edge tiles
            row_end = min(row + tile_size, h)
            col_end = min(col + tile_size, w)
            row_start = max(0, row_end - tile_size)
            col_start = max(0, col_end - tile_size)
 
            if img.ndim == 2:
                tile = img[row_start:row_end, col_start:col_end]
            else:
                tile = img[row_start:row_end, col_start:col_end]
 
            tiles.append((tile, row_start, col_start))
 
    return tiles
 
 
def stitch_detections(all_detections, iou_threshold=0.5):
    """
    Merge detections from overlapping tiles.
    all_detections: list of (bbox, score, class_id) where bbox = [x1, y1, x2, y2]
                    in FULL IMAGE coordinates.
    Returns: filtered detections after NMS across tile boundaries.
    """
    if not all_detections:
        return []
 
    boxes = np.array([d[0] for d in all_detections])
    scores = np.array([d[1] for d in all_detections])
    classes = np.array([d[2] for d in all_detections])
 
    # Non-Maximum Suppression per class
    keep = []
    for cls in np.unique(classes):
        cls_mask = classes == cls
        cls_boxes = boxes[cls_mask]
        cls_scores = scores[cls_mask]
        cls_indices = np.where(cls_mask)[0]
 
        # Sort by score
        order = cls_scores.argsort()[::-1]
 
        while len(order) > 0:
            i = order[0]
            keep.append(cls_indices[i])
 
            if len(order) == 1:
                break
 
            # Compute IoU with remaining
            xx1 = np.maximum(cls_boxes[i, 0], cls_boxes[order[1:], 0])
            yy1 = np.maximum(cls_boxes[i, 1], cls_boxes[order[1:], 1])
            xx2 = np.minimum(cls_boxes[i, 2], cls_boxes[order[1:], 2])
            yy2 = np.minimum(cls_boxes[i, 3], cls_boxes[order[1:], 3])
 
            inter = np.maximum(0, xx2 - xx1) * np.maximum(0, yy2 - yy1)
            area_i = ((cls_boxes[i, 2] - cls_boxes[i, 0]) *
                      (cls_boxes[i, 3] - cls_boxes[i, 1]))
            area_j = ((cls_boxes[order[1:], 2] - cls_boxes[order[1:], 0]) *
                      (cls_boxes[order[1:], 3] - cls_boxes[order[1:], 1]))
            iou = inter / (area_i + area_j - inter + 1e-10)
 
            remaining = np.where(iou < iou_threshold)[0]
            order = order[remaining + 1]
 
    return [all_detections[i] for i in keep]

Step 2: The Tiling Pipeline

import numpy as np
 
def run_detection_on_large_image(img, detect_fn, tile_size=512, overlap=128,
                                  score_threshold=0.3):
    """
    Full pipeline: tile → detect → stitch.
    detect_fn: function that takes a tile (H, W, C) and returns
               list of (bbox_local, score, class_id)
    """
    tiles = tile_image(img, tile_size, overlap)
    all_detections = []
 
    for tile, row_offset, col_offset in tiles:
        # Run detector on tile
        tile_detections = detect_fn(tile)
 
        # Convert local tile coordinates to full image coordinates
        for bbox, score, class_id in tile_detections:
            if score < score_threshold:
                continue
            global_bbox = [
                bbox[0] + col_offset,
                bbox[1] + row_offset,
                bbox[2] + col_offset,
                bbox[3] + row_offset,
            ]
            all_detections.append((global_bbox, score, class_id))
 
    # NMS across tile boundaries
    final = stitch_detections(all_detections, iou_threshold=0.5)
    return final

Step 3: Using YOLO on Aerial/Satellite Data

YOLO (You Only Look Once) is the most practical detector for satellite imagery. Fast, single-pass, good at small objects with the right training data.

Option A: Use Pretrained YOLO (General Objects)

# pip install ultralytics
from ultralytics import YOLO
import numpy as np
 
def yolo_detect_tile(tile, model, conf=0.25):
    """
    Run YOLOv8 on a single tile.
    Returns: list of (bbox, score, class_id)
    """
    results = model(tile, conf=conf, verbose=False)
    detections = []
 
    for result in results:
        for box in result.boxes:
            x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
            score = float(box.conf[0])
            class_id = int(box.cls[0])
            detections.append(([x1, y1, x2, y2], score, class_id))
 
    return detections
 
# Load pretrained YOLOv8
# model = YOLO("yolov8n.pt")  # nano — fastest
# model = YOLO("yolov8s.pt")  # small — better accuracy
 
# For satellite imagery, you likely need a model trained on aerial data:
# - DOTAv2 dataset (oriented aerial object detection)
# - xView dataset (60 classes of overhead objects)
# - DIOR dataset (20 classes)
# Community models on Hugging Face: search "yolo satellite" or "yolo aerial"

Option B: Train YOLO on Aerial Data

# Dataset structure for YOLO training:
# dataset/
#   train/
#     images/
#       tile_0001.jpg
#     labels/
#       tile_0001.txt  # YOLO format: class_id cx cy w h (normalized)
#   val/
#     images/
#     labels/
#
# data.yaml:
# train: dataset/train
# val: dataset/val
# nc: 5  # number of classes
# names: ['vehicle', 'aircraft', 'ship', 'building', 'storage_tank']
 
# Training
# from ultralytics import YOLO
# model = YOLO("yolov8s.pt")
# model.train(data="data.yaml", epochs=100, imgsz=512, batch=16)

Public Aerial Object Detection Datasets (Free)

DatasetClassesImagesResolutionSource
xView601M+ objects~30cmDIUx (satellite)
DOTA182800+variesaerial/satellite
DIOR2023k+0.5-30mGoogle Earth
FAIR1M3715k+0.3-0.8msatellite
VEDAI9 (vehicles)120012.5cmaerial

Step 4: Detect and Stitch — Full Example

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
 
def demo_detection_pipeline():
    """
    Full detection pipeline demo with synthetic data.
    Simulates an airfield with aircraft and vehicles.
    """
    np.random.seed(42)
 
    # Create synthetic airfield image (1000x1000 at ~1m resolution)
    size = 1000
    img = np.random.normal(0.3, 0.02, (size, size, 3))  # background
 
    # Runway
    img[480:520, 100:900] = [0.5, 0.5, 0.5]  # gray
 
    # Taxiway
    img[430:440, 300:700] = [0.45, 0.45, 0.45]
 
    # Buildings (hangars)
    for bx in [150, 250, 350]:
        img[350:400, bx:bx+60] = [0.55, 0.50, 0.45]
 
    # Aircraft on apron (small bright objects)
    aircraft_positions = []
    for ax, ay in [(200, 410), (300, 415), (400, 408), (250, 440), (350, 435)]:
        # Aircraft: ~30m long, ~25m wingspan at 1m = 30x25 pixels
        img[ay:ay+12, ax:ax+20] = [0.65, 0.65, 0.60]
        aircraft_positions.append((ax, ay, ax+20, ay+12))
 
    # Vehicles in parking area
    vehicle_positions = []
    for vx in range(600, 750, 15):
        for vy in range(350, 420, 12):
            if np.random.random() > 0.3:
                img[vy:vy+4, vx:vx+8] = [0.4, 0.38, 0.35]
                vehicle_positions.append((vx, vy, vx+8, vy+4))
 
    img = np.clip(img, 0, 1)
 
    # Simulate detection (replace with real YOLO in practice)
    all_detections = []
    for bbox in aircraft_positions:
        score = np.random.uniform(0.7, 0.95)
        all_detections.append((list(bbox), score, 0))  # class 0 = aircraft
 
    for bbox in vehicle_positions:
        score = np.random.uniform(0.4, 0.85)
        all_detections.append((list(bbox), score, 1))  # class 1 = vehicle
 
    # Visualization
    fig, ax = plt.subplots(figsize=(12, 12))
    ax.imshow(img)
 
    colors = {0: "red", 1: "cyan"}
    labels = {0: "Aircraft", 1: "Vehicle"}
 
    for bbox, score, cls in all_detections:
        x1, y1, x2, y2 = bbox
        rect = plt.Rectangle((x1, y1), x2-x1, y2-y1,
                              linewidth=1.5, edgecolor=colors[cls],
                              facecolor="none")
        ax.add_patch(rect)
        ax.text(x1, y1-2, f"{score:.2f}", color=colors[cls], fontsize=6)
 
    # Legend
    patches = [mpatches.Patch(color=c, label=f"{labels[cls]} ({sum(1 for d in all_detections if d[2]==cls)})")
               for cls, c in colors.items()]
    ax.legend(handles=patches, loc="upper right", fontsize=12)
 
    n_aircraft = sum(1 for d in all_detections if d[2] == 0)
    n_vehicles = sum(1 for d in all_detections if d[2] == 1)
    ax.set_title(f"Airfield Object Detection — {n_aircraft} aircraft, {n_vehicles} vehicles")
    ax.axis("off")
    plt.tight_layout()
    plt.savefig("airfield_detection.png", dpi=150)
    plt.show()
 
    return all_detections
 
detections = demo_detection_pipeline()

Step 5: Ship Detection in SAR (Classical Method)

For SAR ship detection, you don’t need a neural network. Ships are bright point targets on dark water — a simple threshold-based CFAR detector works well. See Ship Detection in SAR for the full implementation.

def simple_ship_detection_sar(sar_vv, pixel_size_m=10):
    """
    Quick ship detection for Sentinel-1.
    1. Identify water (dark areas)
    2. Find bright pixels on water
    3. Cluster into ship objects
    """
    from scipy.ndimage import label, binary_dilation
 
    sar_db = 10 * np.log10(np.maximum(sar_vv, 1e-10))
 
    # Water mask: percentile-based (water is typically <-18 dB)
    water = sar_db < np.percentile(sar_db, 30)
 
    # Ship candidates: bright pixels on water
    # Adaptive threshold: mean + 3*std of water pixels
    water_mean = np.mean(sar_db[water])
    water_std = np.std(sar_db[water])
    ship_threshold = water_mean + 5 * water_std
 
    candidates = (sar_db > ship_threshold) & binary_dilation(water, iterations=3)
 
    # Cluster
    labeled, n = label(candidates)
    ships = []
    for i in range(1, n + 1):
        pixels = np.argwhere(labeled == i)
        area = len(pixels)
        if area < 2:  # minimum size
            continue
        length_m = (pixels[:, 1].max() - pixels[:, 1].min() + 1) * pixel_size_m
        width_m = (pixels[:, 0].max() - pixels[:, 0].min() + 1) * pixel_size_m
        centroid = pixels.mean(axis=0)
        ships.append({
            "centroid": centroid,
            "length_m": max(length_m, width_m),
            "width_m": min(length_m, width_m),
            "area_pixels": area,
        })
 
    return ships

Step 6: Building Footprint Extraction

For high-resolution imagery, extract building footprints using segmentation.

import numpy as np
from scipy.ndimage import label, binary_fill_holes, binary_opening
 
def extract_buildings_simple(nir, red, ndbi_threshold=0.0, min_area_pixels=20):
    """
    Simple building extraction using spectral indices.
    Works best with high-res data (<5m).
    For Sentinel-2, this extracts urban AREAS, not individual buildings.
    """
    # NDVI (vegetation has high values)
    ndvi = (nir - red) / (nir + red + 1e-10)
 
    # Non-vegetation, non-water pixels are candidate buildings
    # In practice, you'd also use SWIR for NDBI
    building_candidates = (ndvi < 0.15) & (nir > 0.05)
 
    # Morphological cleanup
    clean = binary_opening(building_candidates, iterations=1)
    clean = binary_fill_holes(clean)
 
    # Label individual buildings
    labeled, n_buildings = label(clean)
 
    buildings = []
    for i in range(1, n_buildings + 1):
        mask = labeled == i
        area = np.sum(mask)
        if area < min_area_pixels:
            continue
        centroid = np.mean(np.argwhere(mask), axis=0)
        buildings.append({
            "id": i,
            "centroid": centroid,
            "area_pixels": area,
        })
 
    return buildings, labeled

Step 7: Vehicle Counting

import numpy as np
import matplotlib.pyplot as plt
 
def count_vehicles_in_area(img, roi_bbox, min_brightness=0.4, max_size=200,
                            min_size=8):
    """
    Count bright objects (potential vehicles) in a region of interest.
    Assumes high-res imagery (<1m).
    roi_bbox: (row_start, row_end, col_start, col_end)
    """
    from scipy.ndimage import label
 
    r1, r2, c1, c2 = roi_bbox
    roi = img[r1:r2, c1:c2]
 
    if roi.ndim == 3:
        roi_gray = np.mean(roi, axis=2)
    else:
        roi_gray = roi
 
    # Detect bright objects (vehicles on parking lot)
    bright = roi_gray > min_brightness
 
    labeled, n = label(bright)
    vehicles = 0
    for i in range(1, n + 1):
        size = np.sum(labeled == i)
        if min_size <= size <= max_size:
            vehicles += 1
 
    return vehicles
 
# Track vehicle counts over time for change analysis:
# dates = ["2024-01-15", "2024-02-15", "2024-03-15", ...]
# counts = [count_vehicles_in_area(img, parking_lot_bbox) for img in images]
# A sudden increase in vehicle count at a military facility = increased readiness

Evaluation Metrics

When you deploy a detector, measure performance:

def compute_detection_metrics(predictions, ground_truth, iou_threshold=0.5):
    """
    Compute precision, recall, F1 for object detection.
    predictions: list of [x1, y1, x2, y2, score]
    ground_truth: list of [x1, y1, x2, y2]
    """
    if not predictions or not ground_truth:
        return {"precision": 0, "recall": 0, "f1": 0}
 
    preds = sorted(predictions, key=lambda x: -x[4])  # sort by score
    gt_matched = [False] * len(ground_truth)
 
    tp, fp = 0, 0
    for pred in preds:
        best_iou = 0
        best_gt = -1
        for j, gt in enumerate(ground_truth):
            if gt_matched[j]:
                continue
            iou = compute_iou(pred[:4], gt)
            if iou > best_iou:
                best_iou = iou
                best_gt = j
 
        if best_iou >= iou_threshold:
            tp += 1
            gt_matched[best_gt] = True
        else:
            fp += 1
 
    fn = sum(1 for m in gt_matched if not m)
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
 
    return {"precision": precision, "recall": recall, "f1": f1, "tp": tp, "fp": fp, "fn": fn}
 
 
def compute_iou(box1, box2):
    """IoU between two boxes [x1, y1, x2, y2]."""
    x1 = max(box1[0], box2[0])
    y1 = max(box1[1], box2[1])
    x2 = min(box1[2], box2[2])
    y2 = min(box1[3], box2[3])
    inter = max(0, x2 - x1) * max(0, y2 - y1)
    area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
    area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
    return inter / (area1 + area2 - inter + 1e-10)

Common False Positives

False PositiveWhy It HappensMitigation
Bright roof cornersSimilar size/brightness to vehiclesUse spectral info (vehicles have different NIR)
Shipping containersRectangular, vehicle-sizedContextual: containers are in rows at ports
Shadows of buildingsDark rectangles at consistent angleShadow direction analysis
Boats vs vehiclesSimilar size in overhead viewContext: water vs land
Cloud shadowsDark patches on groundCloud mask from SCL band

Try This Next

Exercise 1: Fine-Tune YOLO on Aerial Data

  1. Download DIOR or VEDAI dataset (small, manageable)
  2. Convert to YOLO format
  3. Fine-tune YOLOv8-nano for 50 epochs
  4. Evaluate on test set — what precision/recall do you get?

Exercise 2: Ship Detection Pipeline

  1. Download a Sentinel-1 GRD scene of a busy shipping lane
  2. Apply speckle filtering (Lee filter from SAR Fundamentals and Analysis)
  3. Run CFAR ship detection
  4. Count ships and estimate their sizes
  5. Compare with optical imagery of the same area/time

Exercise 3: Vehicle Count Time Series

  1. Using Google Earth or Planet Explorer, find a parking lot visible in satellite imagery
  2. Count vehicles manually on 3-5 dates
  3. What patterns do you observe? (weekday vs weekend, time of year)
  4. This manual process is what automated detection replaces at scale

Self-Test Questions

  1. Why do you need overlapping tiles instead of non-overlapping?
  2. A YOLO model trained on ImageNet car photos fails on satellite imagery. Why?
  3. What is the minimum GSD needed to detect a standard car (4.5m x 1.8m)?
  4. Why is CFAR better than a fixed threshold for SAR ship detection?
  5. You get 95% precision but 40% recall. What does this mean operationally?

See also: SAR Fundamentals and Analysis | Change Detection | Multispectral Analysis Next: Terrain Analysis and Geolocation