Skip to content

Detection Utils

Compute Intersection over Union (IoU) of two sets of bounding boxes - boxes_true and boxes_detection. Both sets of boxes are expected to be in (x_min, y_min, x_max, y_max) format.


Name Type Description Default
boxes_true ndarray

2D np.ndarray representing ground-truth boxes. shape = (N, 4) where N is number of true objects.

boxes_detection ndarray

2D np.ndarray representing detection boxes. shape = (M, 4) where M is number of detected objects.



Type Description

np.ndarray: Pairwise IoU of boxes from boxes_true and boxes_detection. shape = (N, M) where N is number of true objects and M is number of detected objects.

Source code in supervision/detection/
def box_iou_batch(boxes_true: np.ndarray, boxes_detection: np.ndarray) -> np.ndarray:
    Compute Intersection over Union (IoU) of two sets of bounding boxes -
        `boxes_true` and `boxes_detection`. Both sets
        of boxes are expected to be in `(x_min, y_min, x_max, y_max)` format.

        boxes_true (np.ndarray): 2D `np.ndarray` representing ground-truth boxes.
            `shape = (N, 4)` where `N` is number of true objects.
        boxes_detection (np.ndarray): 2D `np.ndarray` representing detection boxes.
            `shape = (M, 4)` where `M` is number of detected objects.

        np.ndarray: Pairwise IoU of boxes from `boxes_true` and `boxes_detection`.
            `shape = (N, M)` where `N` is number of true objects and
            `M` is number of detected objects.

    def box_area(box):
        return (box[2] - box[0]) * (box[3] - box[1])

    area_true = box_area(boxes_true.T)
    area_detection = box_area(boxes_detection.T)

    top_left = np.maximum(boxes_true[:, None, :2], boxes_detection[:, :2])
    bottom_right = np.minimum(boxes_true[:, None, 2:], boxes_detection[:, 2:])

    area_inter = - top_left, a_min=0, a_max=None), 2)
    return area_inter / (area_true[:, None] + area_detection - area_inter)

Compute Intersection over Union (IoU) of two sets of masks - masks_true and masks_detection.


Name Type Description Default
masks_true ndarray

3D np.ndarray representing ground-truth masks.

masks_detection ndarray

3D np.ndarray representing detection masks.

memory_limit int

memory limit in MB, default is 1024 * 5 MB (5GB).

1024 * 5


Type Description

np.ndarray: Pairwise IoU of masks from masks_true and masks_detection.

Source code in supervision/detection/
def mask_iou_batch(
    masks_true: np.ndarray,
    masks_detection: np.ndarray,
    memory_limit: int = 1024 * 5,
) -> np.ndarray:
    Compute Intersection over Union (IoU) of two sets of masks -
        `masks_true` and `masks_detection`.

        masks_true (np.ndarray): 3D `np.ndarray` representing ground-truth masks.
        masks_detection (np.ndarray): 3D `np.ndarray` representing detection masks.
        memory_limit (int, optional): memory limit in MB, default is 1024 * 5 MB (5GB).

        np.ndarray: Pairwise IoU of masks from `masks_true` and `masks_detection`.
    memory = (
        * masks_true.shape[1]
        * masks_true.shape[2]
        * masks_detection.shape[0]
        / 1024
        / 1024
    if memory <= memory_limit:
        return _mask_iou_batch_split(masks_true, masks_detection)

    ious = []
    step = max(
        * 1024
        * 1024
        // (
            * masks_detection.shape[1]
            * masks_detection.shape[2]
    for i in range(0, masks_true.shape[0], step):
        ious.append(_mask_iou_batch_split(masks_true[i : i + step], masks_detection))

    return np.vstack(ious)

Perform Non-Maximum Suppression (NMS) on object detection predictions.


Name Type Description Default
predictions ndarray

An array of object detection predictions in the format of (x_min, y_min, x_max, y_max, score) or (x_min, y_min, x_max, y_max, score, class).

iou_threshold float

The intersection-over-union threshold to use for non-maximum suppression.



Type Description

np.ndarray: A boolean array indicating which predictions to keep after n on-maximum suppression.


Type Description

If iou_threshold is not within the closed range from 0 to 1.

Source code in supervision/detection/
def box_non_max_suppression(
    predictions: np.ndarray, iou_threshold: float = 0.5
) -> np.ndarray:
    Perform Non-Maximum Suppression (NMS) on object detection predictions.

        predictions (np.ndarray): An array of object detection predictions in
            the format of `(x_min, y_min, x_max, y_max, score)`
            or `(x_min, y_min, x_max, y_max, score, class)`.
        iou_threshold (float, optional): The intersection-over-union threshold
            to use for non-maximum suppression.

        np.ndarray: A boolean array indicating which predictions to keep after n
            on-maximum suppression.

        AssertionError: If `iou_threshold` is not within the
            closed range from `0` to `1`.
    assert 0 <= iou_threshold <= 1, (
        "Value of `iou_threshold` must be in the closed range from 0 to 1, "
        f"{iou_threshold} given."
    rows, columns = predictions.shape

    # add column #5 - category filled with zeros for agnostic nms
    if columns == 5:
        predictions = np.c_[predictions, np.zeros(rows)]

    # sort predictions column #4 - score
    sort_index = np.flip(predictions[:, 4].argsort())
    predictions = predictions[sort_index]

    boxes = predictions[:, :4]
    categories = predictions[:, 5]
    ious = box_iou_batch(boxes, boxes)
    ious = ious - np.eye(rows)

    keep = np.ones(rows, dtype=bool)

    for index, (iou, category) in enumerate(zip(ious, categories)):
        if not keep[index]:

        # drop detections with iou > iou_threshold and
        # same category as current detections
        condition = (iou > iou_threshold) & (categories == category)
        keep = keep & ~condition

    return keep[sort_index.argsort()]

Perform Non-Maximum Suppression (NMS) on segmentation predictions.


Name Type Description Default
predictions ndarray

A 2D array of object detection predictions in the format of (x_min, y_min, x_max, y_max, score) or (x_min, y_min, x_max, y_max, score, class). Shape: (N, 5) or (N, 6), where N is the number of predictions.

masks ndarray

A 3D array of binary masks corresponding to the predictions. Shape: (N, H, W), where N is the number of predictions, and H, W are the dimensions of each mask.

iou_threshold float

The intersection-over-union threshold to use for non-maximum suppression.

mask_dimension int

The dimension to which the masks should be resized before computing IOU values. Defaults to 640.



Type Description

np.ndarray: A boolean array indicating which predictions to keep after non-maximum suppression.


Type Description

If iou_threshold is not within the closed

Source code in supervision/detection/
def mask_non_max_suppression(
    predictions: np.ndarray,
    masks: np.ndarray,
    iou_threshold: float = 0.5,
    mask_dimension: int = 640,
) -> np.ndarray:
    Perform Non-Maximum Suppression (NMS) on segmentation predictions.

        predictions (np.ndarray): A 2D array of object detection predictions in
            the format of `(x_min, y_min, x_max, y_max, score)`
            or `(x_min, y_min, x_max, y_max, score, class)`. Shape: `(N, 5)` or
            `(N, 6)`, where N is the number of predictions.
        masks (np.ndarray): A 3D array of binary masks corresponding to the predictions.
            Shape: `(N, H, W)`, where N is the number of predictions, and H, W are the
            dimensions of each mask.
        iou_threshold (float, optional): The intersection-over-union threshold
            to use for non-maximum suppression.
        mask_dimension (int, optional): The dimension to which the masks should be
            resized before computing IOU values. Defaults to 640.

        np.ndarray: A boolean array indicating which predictions to keep after
            non-maximum suppression.

        AssertionError: If `iou_threshold` is not within the closed
        range from `0` to `1`.
    assert 0 <= iou_threshold <= 1, (
        "Value of `iou_threshold` must be in the closed range from 0 to 1, "
        f"{iou_threshold} given."
    rows, columns = predictions.shape

    if columns == 5:
        predictions = np.c_[predictions, np.zeros(rows)]

    sort_index = predictions[:, 4].argsort()[::-1]
    predictions = predictions[sort_index]
    masks = masks[sort_index]
    masks_resized = resize_masks(masks, mask_dimension)
    ious = mask_iou_batch(masks_resized, masks_resized)
    categories = predictions[:, 5]

    keep = np.ones(rows, dtype=bool)
    for i in range(rows):
        if keep[i]:
            condition = (ious[i] > iou_threshold) & (categories[i] == categories)
            keep[i + 1 :] = np.where(condition[i + 1 :], False, keep[i + 1 :])

    return keep[sort_index.argsort()]

Generate a mask from a polygon.


Name Type Description Default
polygon ndarray

The polygon for which the mask should be generated, given as a list of vertices.

resolution_wh Tuple[int, int]

The width and height of the desired resolution.



Type Description

np.ndarray: The generated 2D mask, where the polygon is marked with 1's and the rest is filled with 0's.

Source code in supervision/detection/
def polygon_to_mask(polygon: np.ndarray, resolution_wh: Tuple[int, int]) -> np.ndarray:
    """Generate a mask from a polygon.

        polygon (np.ndarray): The polygon for which the mask should be generated,
            given as a list of vertices.
        resolution_wh (Tuple[int, int]): The width and height of the desired resolution.

        np.ndarray: The generated 2D mask, where the polygon is marked with
            `1`'s and the rest is filled with `0`'s.
    width, height = resolution_wh
    mask = np.zeros((height, width))

    cv2.fillPoly(mask, [polygon], color=1)
    return mask

Converts a 3D np.array of 2D bool masks into a 2D np.array of bounding boxes.


Name Type Description Default
masks ndarray

A 3D np.array of shape (N, W, H) containing 2D bool masks



Type Description

np.ndarray: A 2D np.array of shape (N, 4) containing the bounding boxes (x_min, y_min, x_max, y_max) for each mask

Source code in supervision/detection/
def mask_to_xyxy(masks: np.ndarray) -> np.ndarray:
    Converts a 3D `np.array` of 2D bool masks into a 2D `np.array` of bounding boxes.

        masks (np.ndarray): A 3D `np.array` of shape `(N, W, H)`
            containing 2D bool masks

        np.ndarray: A 2D `np.array` of shape `(N, 4)` containing the bounding boxes
            `(x_min, y_min, x_max, y_max)` for each mask
    n = masks.shape[0]
    bboxes = np.zeros((n, 4), dtype=int)

    for i, mask in enumerate(masks):
        rows, cols = np.where(mask)

        if len(rows) > 0 and len(cols) > 0:
            x_min, x_max = np.min(cols), np.max(cols)
            y_min, y_max = np.min(rows), np.max(rows)
            bboxes[i, :] = [x_min, y_min, x_max, y_max]

    return bboxes

Converts a binary mask to a list of polygons.


Name Type Description Default
mask ndarray

A binary mask represented as a 2D NumPy array of shape (H, W), where H and W are the height and width of the mask, respectively.



Type Description

List[np.ndarray]: A list of polygons, where each polygon is represented by a NumPy array of shape (N, 2), containing the x, y coordinates of the points. Polygons with fewer points than MIN_POLYGON_POINT_COUNT = 3 are excluded from the output.

Source code in supervision/detection/
def mask_to_polygons(mask: np.ndarray) -> List[np.ndarray]:
    Converts a binary mask to a list of polygons.

        mask (np.ndarray): A binary mask represented as a 2D NumPy array of
            shape `(H, W)`, where H and W are the height and width of
            the mask, respectively.

        List[np.ndarray]: A list of polygons, where each polygon is represented by a
            NumPy array of shape `(N, 2)`, containing the `x`, `y` coordinates
            of the points. Polygons with fewer points than `MIN_POLYGON_POINT_COUNT = 3`
            are excluded from the output.

    contours, _ = cv2.findContours(
        mask.astype(np.uint8), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE
    return [
        np.squeeze(contour, axis=1)
        for contour in contours
        if contour.shape[0] >= MIN_POLYGON_POINT_COUNT

Converts a polygon represented by a NumPy array into a bounding box.


Name Type Description Default
polygon ndarray

A polygon represented by a NumPy array of shape (N, 2), containing the x, y coordinates of the points.



Type Description

np.ndarray: A 1D NumPy array containing the bounding box (x_min, y_min, x_max, y_max) of the input polygon.

Source code in supervision/detection/
def polygon_to_xyxy(polygon: np.ndarray) -> np.ndarray:
    Converts a polygon represented by a NumPy array into a bounding box.

        polygon (np.ndarray): A polygon represented by a NumPy array of shape `(N, 2)`,
            containing the `x`, `y` coordinates of the points.

        np.ndarray: A 1D NumPy array containing the bounding box
            `(x_min, y_min, x_max, y_max)` of the input polygon.
    x_min, y_min = np.min(polygon, axis=0)
    x_max, y_max = np.max(polygon, axis=0)
    return np.array([x_min, y_min, x_max, y_max])

Filters a list of polygons based on their area.


Name Type Description Default
polygons List[ndarray]

A list of polygons, where each polygon is represented by a NumPy array of shape (N, 2), containing the x, y coordinates of the points.

min_area Optional[float]

The minimum area threshold. Only polygons with an area greater than or equal to this value will be included in the output. If set to None, no minimum area constraint will be applied.

max_area Optional[float]

The maximum area threshold. Only polygons with an area less than or equal to this value will be included in the output. If set to None, no maximum area constraint will be applied.



Type Description

List[np.ndarray]: A new list of polygons containing only those with areas within the specified thresholds.

Source code in supervision/detection/
def filter_polygons_by_area(
    polygons: List[np.ndarray],
    min_area: Optional[float] = None,
    max_area: Optional[float] = None,
) -> List[np.ndarray]:
    Filters a list of polygons based on their area.

        polygons (List[np.ndarray]): A list of polygons, where each polygon is
            represented by a NumPy array of shape `(N, 2)`,
            containing the `x`, `y` coordinates of the points.
        min_area (Optional[float]): The minimum area threshold.
            Only polygons with an area greater than or equal to this value
            will be included in the output. If set to None,
            no minimum area constraint will be applied.
        max_area (Optional[float]): The maximum area threshold.
            Only polygons with an area less than or equal to this value
            will be included in the output. If set to None,
            no maximum area constraint will be applied.

        List[np.ndarray]: A new list of polygons containing only those with
            areas within the specified thresholds.
    if min_area is None and max_area is None:
        return polygons
    ares = [cv2.contourArea(polygon) for polygon in polygons]
    return [
        for polygon, area in zip(polygons, ares)
        if (min_area is None or area >= min_area)
        and (max_area is None or area <= max_area)


Name Type Description Default
xyxy ndarray

An array of shape (n, 4) containing the bounding boxes coordinates in format [x1, y1, x2, y2]

offset array

An array of shape (2,) containing offset values in format is [dx, dy].



Type Description

np.ndarray: Repositioned bounding boxes.

import numpy as np
import supervision as sv

boxes = np.array([[10, 10, 20, 20], [30, 30, 40, 40]])
offset = np.array([5, 5])
moved_box = sv.move_boxes(boxes, offset)
# np.array([
#    [15, 15, 25, 25],
#     [35, 35, 45, 45]
# ])
Source code in supervision/detection/
def move_boxes(xyxy: np.ndarray, offset: np.ndarray) -> np.ndarray:
        xyxy (np.ndarray): An array of shape `(n, 4)` containing the bounding boxes
            coordinates in format `[x1, y1, x2, y2]`
        offset (np.array): An array of shape `(2,)` containing offset values in format
            is `[dx, dy]`.

        np.ndarray: Repositioned bounding boxes.

        import numpy as np
        import supervision as sv

        boxes = np.array([[10, 10, 20, 20], [30, 30, 40, 40]])
        offset = np.array([5, 5])
        moved_box = sv.move_boxes(boxes, offset)
        # np.array([
        #    [15, 15, 25, 25],
        #     [35, 35, 45, 45]
        # ])
    return xyxy + np.hstack([offset, offset])

Scale the dimensions of bounding boxes.


Name Type Description Default
xyxy ndarray

An array of shape (n, 4) containing the bounding boxes coordinates in format [x1, y1, x2, y2]

factor float

A float value representing the factor by which the box dimensions are scaled. A factor greater than 1 enlarges the boxes, while a factor less than 1 shrinks them.



Type Description

np.ndarray: Scaled bounding boxes.

import numpy as np
import supervision as sv

boxes = np.array([[10, 10, 20, 20], [30, 30, 40, 40]])
factor = 1.5
scaled_bb = sv.scale_boxes(boxes, factor)
# np.array([
#    [ 7.5,  7.5, 22.5, 22.5],
#    [27.5, 27.5, 42.5, 42.5]
# ])
Source code in supervision/detection/
def scale_boxes(xyxy: np.ndarray, factor: float) -> np.ndarray:
    Scale the dimensions of bounding boxes.

        xyxy (np.ndarray): An array of shape `(n, 4)` containing the bounding boxes
            coordinates in format `[x1, y1, x2, y2]`
        factor (float): A float value representing the factor by which the box
            dimensions are scaled. A factor greater than 1 enlarges the boxes, while a
            factor less than 1 shrinks them.

        np.ndarray: Scaled bounding boxes.

        import numpy as np
        import supervision as sv

        boxes = np.array([[10, 10, 20, 20], [30, 30, 40, 40]])
        factor = 1.5
        scaled_bb = sv.scale_boxes(boxes, factor)
        # np.array([
        #    [ 7.5,  7.5, 22.5, 22.5],
        #    [27.5, 27.5, 42.5, 42.5]
        # ])
    centers = (xyxy[:, :2] + xyxy[:, 2:]) / 2
    new_sizes = (xyxy[:, 2:] - xyxy[:, :2]) * factor
    return np.concatenate((centers - new_sizes / 2, centers + new_sizes / 2), axis=1)
