Count in Zone

With supervision, you can count the number of objects in a zone in an image or video. In this guide, we will show how to count the number of cars in a traffic video.

View the notebook that accompanies this tutorial.

To make it easier for you to follow our tutorial download the video we will use as an example. You can do this using the supervision.assets module:

from supervision.assets import download_assets, VideoAssets

download_assets(VideoAssets.VEHICLES_2)

Initialize a Model and Load Video¶

First, we need to initialize a model. Let's use a YOLOv8 model with the default COCO checkpoint. We also need to load a video on which to run inference.

Create a YOLO model instance and load the source video using supervision's VideoInfo helper. The model will process each frame during inference, while VideoInfo extracts resolution and frame-rate metadata needed by the polygon zone annotator. A shared color palette ensures consistent zone coloring throughout the output video.

import numpy as np
import supervision as sv
import cv2

from ultralytics import YOLO

model = YOLO("yolov8s.pt")

VIDEO = str(VideoAssets.VEHICLES_2)

colors = sv.ColorPalette.default()
video_info = sv.VideoInfo.from_video_path(VIDEO)

Calculate Coordinates¶

To count objects in a zone, you need to know the coordinates where you want to draw the zone.

You can calculate coordinates using the PolygonZone web utility.

To use the PolygonZone website, you will need to upload an image or frame from a video. You can retrieve a frame using this code:

generator = sv.get_video_frames_generator(VIDEO)
iterator = iter(generator)

frame = next(iterator)

cv2.imwrite("first_frame.png", frame)

PolygonZone will give you NumPy arrays that you can use with supervision to count objects in zones.

Save the coordinates in an array:

polygons = [
    np.array([[718, 595], [927, 592], [851, 1062], [42, 1059]]),
    np.array([[987, 595], [1199, 595], [1893, 1056], [1015, 1062]]),
]

Define Zones¶

With the coordinates of the zones to draw ready, we can set up our zones:

Instantiate a PolygonZone for each polygon array, pairing it with a PolygonZoneAnnotator for visual overlay and a BoxAnnotator for drawing detection boxes. Each zone will later trigger on incoming detections to determine which objects fall inside its boundaries, enabling per-zone counting in the inference callback.

zones = [
    sv.PolygonZone(polygon=polygon, frame_resolution_wh=video_info.resolution_wh)
    for polygon in polygons
]
zone_annotators = [
    sv.PolygonZoneAnnotator(
        zone=zone,
        color=colors.by_idx(index),
        thickness=4,
        text_thickness=8,
        text_scale=4,
    )
    for index, zone in enumerate(zones)
]
box_annotators = [
    sv.BoxAnnotator(
        color=colors.by_idx(index),
        thickness=4,
        text_thickness=4,
        text_scale=2,
    )
    for index in range(len(polygons))
]

Run Inference¶

We can run inference on a video using the sv.process_video function. This function accepts a callback that runs inference on each frame and compiles the results into a video.

Below, we can call our YOLOv8 model, annotate predictions and zones, then save the results to a file called result.mp4.

def process_frame(frame: np.ndarray, i) -> np.ndarray:
    results = model(frame, imgsz=1280, verbose=False)[0]
    detections = sv.Detections.from_ultralytics(results)

    for zone, zone_annotator, box_annotator in zip(
        zones, zone_annotators, box_annotators
    ):
        mask = zone.trigger(detections=detections)
        detections_filtered = detections[mask]
        frame = box_annotator.annotate(
            scene=frame, detections=detections_filtered, skip_label=True
        )
        frame = zone_annotator.annotate(scene=frame)

    return frame


sv.process_video(source_path=VIDEO, target_path="result.mp4", callback=process_frame)

Here is an example of inference run on the video:

Count in Zone

Initialize a Model and Load Video¶

Calculate Coordinates¶

Define Zones¶

Run Inference¶

Comments