Detect Small Objects with `supervision.InferenceSlicer`¶

This cookbook shows how to use Slicing Aided Hyper Inference (SAHI) for small object detection with supervision.

"Small Object Detection"

Click the Open in Colab button to run the cookbook on Google Colab.

Before you start¶

You'll need:

A free Roboflow account. Don't have one? Create one here.
An API key from Roboflow. Need help getting one? Learn more here.

Install required packages¶

Let's install the dependencies for this project. Here's a list of what

inference: a package by Roboflow for easy deployment of computer vision models.
supervision: a package by Roboflow that provides utilities for building and managing computer vision applications.

In [ ]:

Copied!

%pip install inference supervision jupyter_compare_view
%pip install inference supervision jupyter_compare_view

Crowd counting with Computer Vision¶

How would you go about solving the problem of counting people in crowds? After some tests, I found that the best approach is to detect people’s heads. Other body parts are likely occluded by other people, but heads are usually exposed, especially in aerial or high-level shots.

Using an Open-Source Public Model for People Detection¶

Detecting people (or their heads) is a common problem that has been addressed by many researchers in the past. In this project, we’ll use an open-source public dataset and a fine-tuned model to perform inference on images.

Roboflow Universe

Some details about the project "people_counterv0 Computer Vision Project":

Dataset of 4,574 images
mAP=49.2% / Precision=74.5% / Recall=39.2
Model: Roboflow 2.0 Object Detection (fast)
Checkpoint: COCOv6n
Created by: SIT

Imports¶

Run the code below to download to load the modules necessarry for this cookbook

In [2]:

Copied!





import math
import os
import time

import cv2
import matplotlib.pyplot as plt
import numpy as np
import supervision as sv
from inference import get_model
from jupyter_compare_view import compare
import math
import os
import time

import cv2
import matplotlib.pyplot as plt
import numpy as np
import supervision as sv
from inference import get_model
from jupyter_compare_view import compare

Download Image¶

In [ ]:

Copied!

# Download the image
!wget -O human_tower.jpg "https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/4_de_8_amb_l%27agulla_carregat_Castellers_de_Barcelona_%2821937141066%29.jpg/2560px-4_de_8_amb_l%27agulla_carregat_Castellers_de_Barcelona_%2821937141066%29.jpg"
# Download the image
!wget -O human_tower.jpg "https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/4_de_8_amb_l%27agulla_carregat_Castellers_de_Barcelona_%2821937141066%29.jpg/2560px-4_de_8_amb_l%27agulla_carregat_Castellers_de_Barcelona_%2821937141066%29.jpg"

In [7]:

Copied!





image = cv2.imread("human_tower.jpg")
image_wh = (image.shape[1], image.shape[0])
print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
sv.plot_image(image)
image = cv2.imread("human_tower.jpg")
image_wh = (image.shape[1], image.shape[0])
print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
sv.plot_image(image)

Image shape: 2560w x 1696h

No description has been provided for this image

You're looking at a Castell, a human tower traditionally built at festivals in parts of Catalonia, Spain, and has since spread to the Balearic Islands and the Valencian Community. The source of the image is here.jpg), and you could learn more about these human towers in Wikipedia

Let's try our model's performance¶

Before we dive into the SAHI technique for small object detection, it’s useful to see how a fine-tuned model performs with the image as is—without any pre-processing or slicing. The goal is to understand when the model starts to fail so that we can progressively move towards an efficient slicing strategy.

Let’s run the model!

In [8]:

Copied!





MODEL_ID = "people_counterv0/1"
API_KEY = "" # Retrieve your API key: https://docs.roboflow.com/api-reference/authentication

# If using Google Colab
#from google.colab import userdata
#API_KEY = userdata.get("ROBOFLOW_API_KEY")  #Retrieve your API key: https://docs.roboflow.com/api-reference/authentication

model = get_model(MODEL_ID, api_key=API_KEY)

# Run inference
results = model.infer(image, model_id=MODEL_ID)
detections = sv.Detections.from_inference(results[0])

print(f"Found {len(detections)} people")

bbox_annotator = sv.BoxAnnotator(
    color=sv.ColorPalette.DEFAULT.colors[6],
    thickness=2
)

# Annotate our image with detections.
image_no_sahi = bbox_annotator.annotate(scene=image.copy(), detections=detections)

sv.plot_image(image_no_sahi)
MODEL_ID = "people_counterv0/1"
API_KEY = "" # Retrieve your API key: https://docs.roboflow.com/api-reference/authentication

# If using Google Colab
#from google.colab import userdata
#API_KEY = userdata.get("ROBOFLOW_API_KEY")  #Retrieve your API key: https://docs.roboflow.com/api-reference/authentication

model = get_model(MODEL_ID, api_key=API_KEY)

# Run inference
results = model.infer(image, model_id=MODEL_ID)
detections = sv.Detections.from_inference(results[0])

print(f"Found {len(detections)} people")

bbox_annotator = sv.BoxAnnotator(
    color=sv.ColorPalette.DEFAULT.colors[6],
    thickness=2
)

# Annotate our image with detections.
image_no_sahi = bbox_annotator.annotate(scene=image.copy(), detections=detections)

sv.plot_image(image_no_sahi)

UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'CPUExecutionProvider'
UserWarning: Specified provider 'OpenVINOExecutionProvider' is not in available provider names.Available providers: 'CPUExecutionProvider'

Found 185 people

Compare source image with no SAHI detected Image¶

In [9]:

Copied!

resize_image = (600, 400)

bgr_image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
bgr_image_no_sahi = cv2.cvtColor(image_no_sahi, cv2.COLOR_RGB2BGR)

# Resize the images for better comparison
bgr_image = cv2.resize(bgr_image, resize_image)
bgr_image_no_sahi = cv2.resize(bgr_image_no_sahi, resize_image)

compare(bgr_image, bgr_image_no_sahi, start_mode="horizontal", start_slider_pos=0.5)
resize_image = (600, 400)

bgr_image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
bgr_image_no_sahi = cv2.cvtColor(image_no_sahi, cv2.COLOR_RGB2BGR)

# Resize the images for better comparison
bgr_image = cv2.resize(bgr_image, resize_image)
bgr_image_no_sahi = cv2.resize(bgr_image_no_sahi, resize_image)

compare(bgr_image, bgr_image_no_sahi, start_mode="horizontal", start_slider_pos=0.5)

Out[9]:

The model shows strong performance in detecting people in the lower half of the image, but it struggles to accurately predict boxes in the upper half. This suggests two key insights: first, the model is proficient at identifying people’s heads from various angles, and second, using SAHI could effectively address the detection challenges in the upper portion of the image. Now, it’s time to try SAHI!

Using `sv.InferenceSlicer` for small object detection¶

The InferenceSlicer is a utility for performing slicing-based inference on large images, particularly useful for detecting small objects. It divides a large image into smaller slices, runs inference on each slice, and then merges the results to form the final detections for the entire image. This method, known as Slicing Adaptive Hyper Inference (SAHI), improves detection accuracy by focusing on smaller regions where small objects might be missed in a full-scale inference.

Key Features:¶

Slicing Strategy: Divides an image into smaller slices with configurable sizes and overlaps.
Overlap Management: Supports different overlap strategies (ratio-based or pixel-based) to ensure smooth transitions between slices.
Detection Merging: Merges detections from all slices using Non-Maximum Suppression (NMS) or Non-Maximum Merging (NMM) to handle overlapping detections.
Parallel Processing: Utilizes multi-threading to perform inference on slices concurrently, enhancing speed.
Custom Inference Callback: Allows you to define their own inference function for flexibility in integrating various detection models.

SAHI

SAHI can be viewed as a framework designed to tackle the challenge of small object detection. The InferenceSlicer class from the supervision library provides an implementation of SAHI, allowing you to easily use it as follows:

import cv2
import supervision as sv
from ultralytics import YOLO

image = cv2.imread(SOURCE_IMAGE_PATH)
model = YOLO(...)

def callback(image_slice: np.ndarray) -> sv.Detections:
    result = model(image_slice)[0]
    return sv.Detections.from_ultralytics(result)

slicer = sv.InferenceSlicer(
    # A function that performs inference on a given image slice and returns detections.
    callback=callback,
    # Strategy for filtering or merging overlapping detections in slices.
    overlap_filter=sv.OverlapFilter.NON_MAX_SUPPRESSION,
    # Dimensions of each slice measured in pixels. The tuple should be in the format (width, height).
    slice_wh=(100, 100)
)

detections = slicer(image)

Check the documentation of sv.InferenceSlicer here.

Slicing our image with `supervision`¶

Let’s begin by visualizing how these tiles would appear on our image. Let's start with a small set of 2x2 tiles, with a zero overlap both vertically (height) and horizontally (width) between the tiles. The final values of these parameters will ultimately depend on your use case, so trial and error is encouraged!

Some of the methods below are for visualizing the tiles and overlapping. You'll only need the calculate_tile_size method in your application to calculate the size of the tiles.

Utility functions for visualizing tiles¶

In [10]:

Copied!





def tile_image(image_shape: tuple[int, int], slice_wh: tuple[int, int], overlap_wh: tuple[float, float])-> np.ndarray:
    """
    Computes the coordinates and dimensions of tiles for an image with specified slicing and overlap parameters.
    """
    offsets = sv.InferenceSlicer._generate_offset(
        resolution_wh=image_shape,
        slice_wh=slice_wh,
        overlap_ratio_wh=None,
        overlap_wh=overlap_wh
    )

    offsets = np.ceil(offsets).astype(int)

    return offsets

def draw_transparent_tiles(scene: np.ndarray, x: int, y: int, w:int, h:int) -> np.ndarray:
    """
    Draws a transparent tile with an optional index label on the given scene.
    """
    alpha=0.15

    # Generate a mask for the tile
    rectangle = np.zeros((h, w, 3), dtype=np.uint8)
    rectangle.fill(255)

    rect = sv.Rect(x=x, y=y, width=w, height=h)
    overlay_image = sv.draw_image(scene=scene.copy(), image=rectangle, opacity=alpha, rect=rect)

    # Draw a border around the edge of the mask
    border_color = sv.Color.BLACK
    border_thickness=2
    overlay_image = sv.draw_rectangle(
        scene=overlay_image,
        rect=sv.Rect(x=x, y=y, width=w, height=h),
        color=border_color,
        thickness=border_thickness
    )

    return overlay_image

def draw_tiles(scene: np.ndarray, offsets):
    """
    Draws transparent tiles on a scene based on the given offsets.
    """

    tiled_image = scene.copy()

    for index, offset in enumerate(offsets):
        x = offset[0]
        y = offset[1]
        width = offset[2] - x
        height = offset[3] - y

        tiled_image = draw_transparent_tiles(scene=tiled_image, x=x, y=y, w=width, h=height)

    return tiled_image

def print_offsets(offsets):
    for index, (x1, y1, x2, y2) in enumerate(offsets, 1):
        w, h = x2 - x1, y2 - y1
        print(f"Tile {index + 1}")
        print(f"  w={w}, h={h}, x1={x1}, y1={y1}, x2={x2}, y2={y2}, area={w*h}")
def tile_image(image_shape: tuple[int, int], slice_wh: tuple[int, int], overlap_wh: tuple[float, float])-> np.ndarray:
    """
    Computes the coordinates and dimensions of tiles for an image with specified slicing and overlap parameters.
    """
    offsets = sv.InferenceSlicer._generate_offset(
        resolution_wh=image_shape,
        slice_wh=slice_wh,
        overlap_ratio_wh=None,
        overlap_wh=overlap_wh
    )

    offsets = np.ceil(offsets).astype(int)

    return offsets

def draw_transparent_tiles(scene: np.ndarray, x: int, y: int, w:int, h:int) -> np.ndarray:
    """
    Draws a transparent tile with an optional index label on the given scene.
    """
    alpha=0.15

    # Generate a mask for the tile
    rectangle = np.zeros((h, w, 3), dtype=np.uint8)
    rectangle.fill(255)

    rect = sv.Rect(x=x, y=y, width=w, height=h)
    overlay_image = sv.draw_image(scene=scene.copy(), image=rectangle, opacity=alpha, rect=rect)

    # Draw a border around the edge of the mask
    border_color = sv.Color.BLACK
    border_thickness=2
    overlay_image = sv.draw_rectangle(
        scene=overlay_image,
        rect=sv.Rect(x=x, y=y, width=w, height=h),
        color=border_color,
        thickness=border_thickness
    )

    return overlay_image

def draw_tiles(scene: np.ndarray, offsets):
    """
    Draws transparent tiles on a scene based on the given offsets.
    """

    tiled_image = scene.copy()

    for index, offset in enumerate(offsets):
        x = offset[0]
        y = offset[1]
        width = offset[2] - x
        height = offset[3] - y

        tiled_image = draw_transparent_tiles(scene=tiled_image, x=x, y=y, w=width, h=height)

    return tiled_image

def print_offsets(offsets):
    for index, (x1, y1, x2, y2) in enumerate(offsets, 1):
        w, h = x2 - x1, y2 - y1
        print(f"Tile {index + 1}")
        print(f"  w={w}, h={h}, x1={x1}, y1={y1}, x2={x2}, y2={y2}, area={w*h}")

Calculate Tile Size¶

Important: As of supervision==0.23.0 you need to provide the tile size manually. You can the function below to calculate it.

The calculate_tile_size function determines the desired dimentions of the tiles when dividing an image into a grid by considering the following parameters:

Image dimensions: The width and height of the image, specified as (width, height), e.g., (1024, 768).
Grid layout: The number of tiles, specified as (rows, columns), e.g., (2, 2).
Overlap: The percentage of overlap between adjacent tiles, specified separately for horizontal and vertical overlap, e.g., (0.1, 0.1).

It returns a tuple containing:

Tile size: A tuple representing the width and height of each tile, including the overlap (overlap_wh) between adjacent tiles.
Overlap size: A tuple representing the overlap between tiles in pixels (overlap_wh). If the overlap ratio is set to (0.0, 0.0), this value will be (0, 0), indicating no overlap.

For example:

>>> image_shape = (1024, 768)
>>> tiles = (4, 4)
>>> overlap_ratio_wh = (0.15, 0.15)
>>> calculate_tile_size(image_shape, tiles, overlap_ratio_wh)
((295, 221), (39, 29))

In [11]:

Copied!





def calculate_tile_size(image_shape: tuple[int, int], tiles: tuple[int, int], overlap_ratio_wh: tuple[float, float] = (0.0, 0.0)):
    """
    Calculate the size of the tiles based on the image shape, the number of tiles, and the overlap ratio.

    Parameters:
    ----------
    image_shape : tuple[int, int]
        The dimensions of the image as (width, height).

    tiles : tuple[int, int]
        The tiling strategy defined as (rows, columns), specifying the number of tiles along the height and width of the image.

    overlap_ratio_wh : tuple[float, float], optional
        The overlap ratio for width and height as (overlap_ratio_w, overlap_ratio_h). This defines the fraction of overlap between adjacent tiles. Default is (0.0, 0.0), meaning no overlap.

    Returns:
    -------
    tuple[tuple[int, int], tuple[int, int]]
        A tuple containing:
        - The size of each tile as (tile_width, tile_height), accounting for overlap.
        - The overlap dimensions as (overlap_width, overlap_height).

    Example:
    -------
    >>> image_shape = (1024, 768)
    >>> tiles = (4, 4)
    >>> overlap_ratio_wh = (0.15, 0.15)
    >>> calculate_tile_size(image_shape, tiles, overlap_ratio_wh)
    ((295, 221), (39, 29))
    """

    w, h = image_shape
    rows, columns = tiles

    tile_width = (w / columns)
    tile_height = (h / rows)
    overlap_w, overlap_h = overlap_ratio_wh

    tile_width = math.ceil(w / columns * (1 + overlap_w))
    tile_height = math.ceil(h / rows * (1 + overlap_h))
    overlap_wh = (math.ceil(tile_width * overlap_w), math.ceil(tile_height * overlap_h))

    return (tile_width, tile_height), overlap_wh
def calculate_tile_size(image_shape: tuple[int, int], tiles: tuple[int, int], overlap_ratio_wh: tuple[float, float] = (0.0, 0.0)):
    """
    Calculate the size of the tiles based on the image shape, the number of tiles, and the overlap ratio.

    Parameters:
    ----------
    image_shape : tuple[int, int]
        The dimensions of the image as (width, height).

    tiles : tuple[int, int]
        The tiling strategy defined as (rows, columns), specifying the number of tiles along the height and width of the image.

    overlap_ratio_wh : tuple[float, float], optional
        The overlap ratio for width and height as (overlap_ratio_w, overlap_ratio_h). This defines the fraction of overlap between adjacent tiles. Default is (0.0, 0.0), meaning no overlap.

    Returns:
    -------
    tuple[tuple[int, int], tuple[int, int]]
        A tuple containing:
        - The size of each tile as (tile_width, tile_height), accounting for overlap.
        - The overlap dimensions as (overlap_width, overlap_height).

    Example:
    -------
    >>> image_shape = (1024, 768)
    >>> tiles = (4, 4)
    >>> overlap_ratio_wh = (0.15, 0.15)
    >>> calculate_tile_size(image_shape, tiles, overlap_ratio_wh)
    ((295, 221), (39, 29))
    """

    w, h = image_shape
    rows, columns = tiles

    tile_width = (w / columns)
    tile_height = (h / rows)
    overlap_w, overlap_h = overlap_ratio_wh

    tile_width = math.ceil(w / columns * (1 + overlap_w))
    tile_height = math.ceil(h / rows * (1 + overlap_h))
    overlap_wh = (math.ceil(tile_width * overlap_w), math.ceil(tile_height * overlap_h))

    return (tile_width, tile_height), overlap_wh

Visualizing the Image Tiles¶

In [12]:

Copied!





tiles = (2,2)
overlap_ratio_wh = (0.0, 0.0) # The overlap between tiles
slice_wh, overlap_wh = calculate_tile_size(image_wh, tiles, overlap_ratio_wh)
offsets = tile_image(image_wh, slice_wh, overlap_wh)

print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
print(f"Tiles: {tiles}")
print(f"Tile size: {slice_wh[0]}w x {image_wh[1]}")
print(f"Generated {len(offsets)} tiles. These are the calculated dimensions")
print_offsets(offsets)

tiled_image = draw_tiles(scene=image.copy(), offsets=offsets)

sv.plot_image(tiled_image)
tiles = (2,2)
overlap_ratio_wh = (0.0, 0.0) # The overlap between tiles
slice_wh, overlap_wh = calculate_tile_size(image_wh, tiles, overlap_ratio_wh)
offsets = tile_image(image_wh, slice_wh, overlap_wh)

print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
print(f"Tiles: {tiles}")
print(f"Tile size: {slice_wh[0]}w x {image_wh[1]}")
print(f"Generated {len(offsets)} tiles. These are the calculated dimensions")
print_offsets(offsets)

tiled_image = draw_tiles(scene=image.copy(), offsets=offsets)

sv.plot_image(tiled_image)

Image shape: 2560w x 1696h
Tiles: (2, 2)
Tile size: 1280w x 1696
Generated 4 tiles. These are the calculated dimensions
Tile 2
  w=1280, h=848, x1=0, y1=0, x2=1280, y2=848, area=1085440
Tile 3
  w=1280, h=848, x1=1280, y1=0, x2=2560, y2=848, area=1085440
Tile 4
  w=1280, h=848, x1=0, y1=848, x2=1280, y2=1696, area=1085440
Tile 5
  w=1280, h=848, x1=1280, y1=848, x2=2560, y2=1696, area=1085440

$No description has been provided for this image$

You can see that the image has been sliced into four different tiles. Next, each tile will be independently processed by the model, and supervision will merge all the predictions into a coherent set of detections. Notice that we're not using overlapping in at this time (more on that later).

Run Inference on a Sliced Image With `supervision`¶

Running inference on slices of your image is easy with the class InferenceSlicer from Supervision. This API from Roboflow divides a larger image into smaller slices, performs inference on each slice, and then merges the detections into a single detections object.

In [13]:

Copied!





def callback(image_slice: np.ndarray) -> sv.Detections:
  result = get_model(model_id=MODEL_ID, api_key=API_KEY).infer(image_slice )[0]
  return sv.Detections.from_inference(result)

tiles = (2,2) # The number of tiles you want
overlap_ratio_wh = (0.0, 0.0) # The overlap between tiles
slice_wh, overlap_wh = calculate_tile_size(image_wh, tiles, overlap_ratio_wh)
offsets = tile_image(image_wh, slice_wh, overlap_wh)

slicer = sv.InferenceSlicer(
  callback=callback,
  slice_wh=slice_wh,
  overlap_ratio_wh=None,
  overlap_wh=overlap_wh,
  thread_workers=4
)

detections = slicer(image)

print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
print(f"Tiles: {tiles}")
print(f"Tile size: {slice_wh[0]}w x {image_wh[1]}")
print(f"Overlap: {overlap_wh[0]}w x {overlap_wh[1]}h. Ratio {overlap_ratio_wh}")
print(f"Found {len(detections)} people")

tiled_image_2x2 = draw_tiles(scene=image.copy(), offsets=offsets)
tiled_image_2x2 = bbox_annotator.annotate(scene=tiled_image_2x2, detections=detections)

sv.plot_image(image=tiled_image_2x2, size=(20, 20))
def callback(image_slice: np.ndarray) -> sv.Detections:
  result = get_model(model_id=MODEL_ID, api_key=API_KEY).infer(image_slice )[0]
  return sv.Detections.from_inference(result)

tiles = (2,2) # The number of tiles you want
overlap_ratio_wh = (0.0, 0.0) # The overlap between tiles
slice_wh, overlap_wh = calculate_tile_size(image_wh, tiles, overlap_ratio_wh)
offsets = tile_image(image_wh, slice_wh, overlap_wh)

slicer = sv.InferenceSlicer(
  callback=callback,
  slice_wh=slice_wh,
  overlap_ratio_wh=None,
  overlap_wh=overlap_wh,
  thread_workers=4
)

detections = slicer(image)

print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
print(f"Tiles: {tiles}")
print(f"Tile size: {slice_wh[0]}w x {image_wh[1]}")
print(f"Overlap: {overlap_wh[0]}w x {overlap_wh[1]}h. Ratio {overlap_ratio_wh}")
print(f"Found {len(detections)} people")

tiled_image_2x2 = draw_tiles(scene=image.copy(), offsets=offsets)
tiled_image_2x2 = bbox_annotator.annotate(scene=tiled_image_2x2, detections=detections)

sv.plot_image(image=tiled_image_2x2, size=(20, 20))

Image shape: 2560w x 1696h
Tiles: (2, 2)
Tile size: 1280w x 1696
Overlap: 0w x 0h. Ratio (0.0, 0.0)
Found 726 people

$No description has been provided for this image$

Great! We’ve detected 726 people, up from the 185 we initially detected without image slicing. The model is still detecting people from different angles, but it continues to struggle with detecting people located in the farther parts of the plaza. It’s time to increase the number of tiles—in other words, zoom in so the model can capture more details of the small heads of people.

Missing detections

Increasing Tile Density: Moving to a 5x5 Grid¶

Now that we’ve seen improvements with a 2x2 grid, it’s time to push the model further. By increasing the number of tiles to a 5x5 grid, we effectively zoom in on the image, allowing the model to capture finer details, such as smaller and more distant features that might have been missed before. This approach will help us understand how well the model performs with even more zoomed-in images. Let’s explore how this change affects our detection accuracy and overall performance.

In [14]:

Copied!





def callback(image_slice: np.ndarray) -> sv.Detections:
  result = get_model(model_id=MODEL_ID, api_key=API_KEY).infer(image_slice )[0]
  return sv.Detections.from_inference(result)

tiles = (5,5) # The number of tiles you want
overlap_ratio_wh = (0.0, 0.0) # The overlap between tiles
slice_wh, overlap_wh = calculate_tile_size(image_wh, tiles, overlap_ratio_wh)
offsets = tile_image(image_wh, slice_wh, overlap_wh)

slicer = sv.InferenceSlicer(
  callback=callback,
  slice_wh=slice_wh,
  overlap_wh=overlap_wh,
  overlap_ratio_wh=None,
  thread_workers=4
)

detections = slicer(image)

print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
print(f"Tiles: {tiles}")
print(f"Tile size: {slice_wh[0]}w x {image_wh[1]}")
print(f"Overlap: {overlap_wh[0]}w x {overlap_wh[1]}h. Ratio {overlap_ratio_wh}")
print(f"Overlap filter: {sv.OverlapFilter.NON_MAX_SUPPRESSION}")
print(f"Found {len(detections)} people")

tiled_image_5x5 = draw_tiles(scene=image.copy(), offsets=offsets)
tiled_image_5x5 = bbox_annotator.annotate(scene=tiled_image_5x5, detections=detections)

sv.plot_image(image=tiled_image_5x5, size=(20, 20),)

def callback(image_slice: np.ndarray) -> sv.Detections:
  result = get_model(model_id=MODEL_ID, api_key=API_KEY).infer(image_slice )[0]
  return sv.Detections.from_inference(result)

tiles = (5,5) # The number of tiles you want
overlap_ratio_wh = (0.0, 0.0) # The overlap between tiles
slice_wh, overlap_wh = calculate_tile_size(image_wh, tiles, overlap_ratio_wh)
offsets = tile_image(image_wh, slice_wh, overlap_wh)

slicer = sv.InferenceSlicer(
  callback=callback,
  slice_wh=slice_wh,
  overlap_wh=overlap_wh,
  overlap_ratio_wh=None,
  thread_workers=4
)

detections = slicer(image)

print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
print(f"Tiles: {tiles}")
print(f"Tile size: {slice_wh[0]}w x {image_wh[1]}")
print(f"Overlap: {overlap_wh[0]}w x {overlap_wh[1]}h. Ratio {overlap_ratio_wh}")
print(f"Overlap filter: {sv.OverlapFilter.NON_MAX_SUPPRESSION}")
print(f"Found {len(detections)} people")

tiled_image_5x5 = draw_tiles(scene=image.copy(), offsets=offsets)
tiled_image_5x5 = bbox_annotator.annotate(scene=tiled_image_5x5, detections=detections)

sv.plot_image(image=tiled_image_5x5, size=(20, 20),)

Image shape: 2560w x 1696h
Tiles: (5, 5)
Tile size: 512w x 1696
Overlap: 0w x 0h. Ratio (0.0, 0.0)
Overlap filter: OverlapFilter.NON_MAX_SUPPRESSION
Found 1494 people

We’ve just detected 1,494 people using a 25-tile grid (5 rows x 5 columns), a significant increase from the 726 people detected with the 4-tile (2x2) grid. However, as we increase the number of tiles, a new challenge arises: duplicate detections or missed detections along the edges of the tiles. This issue becomes evident in these examples, where overlapping or gaps between tiles lead to inaccuracies in our model’s detection.

Example	Observations
	False Negative, Incomplete bbox
	Double detection, Incomplete bbox
	Incomplete bounding box

Improving Object Detection Near Boundaries with Overlapping¶

When objects, like people, appear at the edges of tiles, they might be detected twice or missed entirely if they span across two tiles. This can lead to inaccurate detection results. To solve this, we use overlapping tiles, allowing the model to see parts of adjacent tiles simultaneously. This overlap helps ensure that objects near the boundaries are fully captured, reducing duplicates and improving accuracy.

We’ll set the overlap ratio to (0.2, 0.2) on the tile’s width and height. This overlap helps ensure that objects near the boundaries are fully captured, reducing duplicates and improving accuracy.

In [15]:

Copied!





tiles = (5,5) # The number of tiles you want
overlap_ratio_wh = (0.15, 0.15) # Ratio of overlapping, width/height

slice_wh, overlap_wh = calculate_tile_size(image_wh, tiles, overlap_ratio_wh)
offsets = tile_image(image_wh, slice_wh, overlap_wh)

slicer = sv.InferenceSlicer(
  callback=callback,
  overlap_filter=sv.OverlapFilter.NON_MAX_SUPPRESSION,
  iou_threshold=0.1,
  slice_wh=slice_wh,
  overlap_ratio_wh=None,
  overlap_wh=overlap_wh,
  thread_workers=4
)

detections = slicer(image)

print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
print(f"Tiles: {tiles}")
print(f"Tile size: {slice_wh[0]}w x {image_wh[1]}")
print(f"Overlap: {overlap_wh[0]}w x {overlap_wh[1]}h. Ratio {overlap_ratio_wh}")
print(f"Overlap Filter: {sv.OverlapFilter.NON_MAX_SUPPRESSION}")
print(f"Found {len(detections)} people")

tiled_image_5x5_nms = draw_tiles(scene=image.copy(), offsets=offsets)
tiled_image_5x5_nms = bbox_annotator.annotate(scene=tiled_image_5x5_nms, detections=detections)

sv.plot_image(image=tiled_image_5x5_nms, size=(20, 20))
tiles = (5,5) # The number of tiles you want
overlap_ratio_wh = (0.15, 0.15) # Ratio of overlapping, width/height

slice_wh, overlap_wh = calculate_tile_size(image_wh, tiles, overlap_ratio_wh)
offsets = tile_image(image_wh, slice_wh, overlap_wh)

slicer = sv.InferenceSlicer(
  callback=callback,
  overlap_filter=sv.OverlapFilter.NON_MAX_SUPPRESSION,
  iou_threshold=0.1,
  slice_wh=slice_wh,
  overlap_ratio_wh=None,
  overlap_wh=overlap_wh,
  thread_workers=4
)

detections = slicer(image)

print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
print(f"Tiles: {tiles}")
print(f"Tile size: {slice_wh[0]}w x {image_wh[1]}")
print(f"Overlap: {overlap_wh[0]}w x {overlap_wh[1]}h. Ratio {overlap_ratio_wh}")
print(f"Overlap Filter: {sv.OverlapFilter.NON_MAX_SUPPRESSION}")
print(f"Found {len(detections)} people")

tiled_image_5x5_nms = draw_tiles(scene=image.copy(), offsets=offsets)
tiled_image_5x5_nms = bbox_annotator.annotate(scene=tiled_image_5x5_nms, detections=detections)

sv.plot_image(image=tiled_image_5x5_nms, size=(20, 20))

Image shape: 2560w x 1696h
Tiles: (5, 5)
Tile size: 589w x 1696
Overlap: 89w x 59h. Ratio (0.15, 0.15)
Overlap Filter: OverlapFilter.NON_MAX_SUPPRESSION
Found 1396 people

$No description has been provided for this image$

Non-Max Supression vs Non-Max Merge¶

When dealing with overlapping detections, it’s essential to determine which detections represent the same object and which are unique. Non-Maximum Suppression (NMS) and Non-Maximum Merging (NMM) are two techniques commonly used to address this challenge. NMS works by eliminating redundant detections based on confidence scores, while NMM combines overlapping detections to enhance the representation of objects spanning multiple tiles. Understanding the difference between these methods helps optimize object detection, particularly near tile boundaries.

In supervision, the overlap_filter parameter allows us to specify the strategy for handling overlapping detections in slices. This parameter can take on two values:

sv.OverlapFilter.NON_MAX_SUPRESSION (default): Eliminates redundant detections by keeping the one with the highest confidence score.
sv.OverlapFilter.NON_MAX_MERGE: Combines overlapping detections to create a more comprehensive representation of objects spanning multiple tiles.

It’s important to note that this method is not perfect and may require further testing and fine-tuning to achieve optimal results in various use cases. You should validate the outputs and adjust parameters as needed to handle specific scenarios effectively.

In [16]:

Copied!





tiles = (5,5) # The number of tiles you want
overlap_ratio_wh = (0.15, 0.15) # The overlap ratio: 20% width, 20% height

slice_wh, overlap_wh = calculate_tile_size(image_wh, tiles, overlap_ratio_wh)
offsets = tile_image(image_wh, slice_wh, overlap_wh)

slicer = sv.InferenceSlicer(
  callback=callback,
  overlap_filter=sv.OverlapFilter.NON_MAX_MERGE,
  #iou_threshold=0.1,
  slice_wh=slice_wh,
  overlap_ratio_wh=None,
  overlap_wh=overlap_wh,
  thread_workers=4
)

detections = slicer(image)

print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
print(f"Tile size: {slice_wh[0]}w x {image_wh[1]}")
print(f"Overlap: {overlap_wh[0]}w x {overlap_wh[1]}h. Ratio {overlap_ratio_wh}")
print(f"Overlap Filter: {sv.OverlapFilter.NON_MAX_MERGE}")
print(f"Found {len(detections)} people")

tiled_image_5x5_nmm = draw_tiles(scene=image.copy(), offsets=offsets)
tiled_image_5x5_nmm = bbox_annotator.annotate(scene=tiled_image_5x5_nmm, detections=detections)

sv.plot_image(image=tiled_image_5x5_nmm, size=(20, 20))
tiles = (5,5) # The number of tiles you want
overlap_ratio_wh = (0.15, 0.15) # The overlap ratio: 20% width, 20% height

slice_wh, overlap_wh = calculate_tile_size(image_wh, tiles, overlap_ratio_wh)
offsets = tile_image(image_wh, slice_wh, overlap_wh)

slicer = sv.InferenceSlicer(
  callback=callback,
  overlap_filter=sv.OverlapFilter.NON_MAX_MERGE,
  #iou_threshold=0.1,
  slice_wh=slice_wh,
  overlap_ratio_wh=None,
  overlap_wh=overlap_wh,
  thread_workers=4
)

detections = slicer(image)

print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
print(f"Tile size: {slice_wh[0]}w x {image_wh[1]}")
print(f"Overlap: {overlap_wh[0]}w x {overlap_wh[1]}h. Ratio {overlap_ratio_wh}")
print(f"Overlap Filter: {sv.OverlapFilter.NON_MAX_MERGE}")
print(f"Found {len(detections)} people")

tiled_image_5x5_nmm = draw_tiles(scene=image.copy(), offsets=offsets)
tiled_image_5x5_nmm = bbox_annotator.annotate(scene=tiled_image_5x5_nmm, detections=detections)

sv.plot_image(image=tiled_image_5x5_nmm, size=(20, 20))

Image shape: 2560w x 1696h
Tile size: 589w x 1696
Overlap: 89w x 59h. Ratio (0.15, 0.15)
Overlap Filter: OverlapFilter.NON_MAX_MERGE
Found 1516 people

Compare Image with Sahi Image¶

In [17]:

Copied!





resize_image = (600, 400)

bgr_image = cv2.cvtColor(image.copy(), cv2.COLOR_RGB2BGR)
tiled_image = bbox_annotator.annotate(scene=image.copy(), detections=detections)
bgr_tiled_image = cv2.cvtColor(tiled_image, cv2.COLOR_RGB2BGR)

# Resize the images for better comparison
tiled_image = cv2.resize(bgr_image, resize_image)
bgr_tiled_image = cv2.resize(bgr_tiled_image, resize_image)

compare(tiled_image, bgr_tiled_image, start_mode="horizontal", start_slider_pos=0.5)
resize_image = (600, 400)

bgr_image = cv2.cvtColor(image.copy(), cv2.COLOR_RGB2BGR)
tiled_image = bbox_annotator.annotate(scene=image.copy(), detections=detections)
bgr_tiled_image = cv2.cvtColor(tiled_image, cv2.COLOR_RGB2BGR)

# Resize the images for better comparison
tiled_image = cv2.resize(bgr_image, resize_image)
bgr_tiled_image = cv2.resize(bgr_tiled_image, resize_image)

compare(tiled_image, bgr_tiled_image, start_mode="horizontal", start_slider_pos=0.5)

Out[17]:

Conclusion¶

In this cookbook, we’ve explored the advantages of using the SAHI technique for enhancing small object detection and the importance of experimenting with various tiling strategies to effectively zoom into images. By combining these approaches, we can improve the accuracy and reliability of object detection models, particularly in challenging scenarios where objects are small or located near the boundaries of tiles. These methods offer practical solutions to common challenges in computer vision, empowering developers to build more robust and precise detection systems.

"Crowd Detection"

More resources¶

InferenceSlicer: https://supervision.roboflow.com/detection/tools/inference_slicer/
Detect Small Objects https://supervision.roboflow.com/latest/how_to/detect_small_objects/
What is Non-Max Merging?: https://blog.roboflow.com/non-max-merging/
How to Detect Small Objects: A Guide https://blog.roboflow.com/detect-small-objects/
How to Use SAHI to Detect Small Objects: https://blog.roboflow.com/how-to-use-sahi-to-detect-small-objects/
SAHI paper: https://arxiv.org/abs/2202.06934
C4W3L07 Nonmax Suppression, Andrew Ng: https://www.youtube.com/watch?v=VAo84c1hQX8

Detect Small Objects with supervision.InferenceSlicer¶

Before you start¶

Install required packages¶

Crowd counting with Computer Vision¶

Using an Open-Source Public Model for People Detection¶

Imports¶

Download Image¶

Let's try our model's performance¶

Compare source image with no SAHI detected Image¶

Using sv.InferenceSlicer for small object detection¶

Key Features:¶

Slicing our image with supervision¶

Utility functions for visualizing tiles¶

Calculate Tile Size¶

Visualizing the Image Tiles¶

Run Inference on a Sliced Image With supervision¶

Increasing Tile Density: Moving to a 5x5 Grid¶

Improving Object Detection Near Boundaries with Overlapping¶

Non-Max Supression vs Non-Max Merge¶

Compare Image with Sahi Image¶

Conclusion¶

More resources¶

Detect Small Objects with `supervision.InferenceSlicer`¶

Using `sv.InferenceSlicer` for small object detection¶

Slicing our image with `supervision`¶

Run Inference on a Sliced Image With `supervision`¶