Skip to content

Detection Utils

Compute Intersection over Union (IoU) of two sets of bounding boxes - boxes_true and boxes_detection. Both sets of boxes are expected to be in (x_min, y_min, x_max, y_max) format.

Parameters:

Name Type Description Default
boxes_true ndarray

2D np.ndarray representing ground-truth boxes. shape = (N, 4) where N is number of true objects.

required
boxes_detection ndarray

2D np.ndarray representing detection boxes. shape = (M, 4) where M is number of detected objects.

required

Returns:

Type Description
ndarray

np.ndarray: Pairwise IoU of boxes from boxes_true and boxes_detection. shape = (N, M) where N is number of true objects and M is number of detected objects.

Source code in supervision/detection/utils.py
def box_iou_batch(boxes_true: np.ndarray, boxes_detection: np.ndarray) -> np.ndarray:
    """
    Compute Intersection over Union (IoU) of two sets of bounding boxes -
        `boxes_true` and `boxes_detection`. Both sets
        of boxes are expected to be in `(x_min, y_min, x_max, y_max)` format.

    Args:
        boxes_true (np.ndarray): 2D `np.ndarray` representing ground-truth boxes.
            `shape = (N, 4)` where `N` is number of true objects.
        boxes_detection (np.ndarray): 2D `np.ndarray` representing detection boxes.
            `shape = (M, 4)` where `M` is number of detected objects.

    Returns:
        np.ndarray: Pairwise IoU of boxes from `boxes_true` and `boxes_detection`.
            `shape = (N, M)` where `N` is number of true objects and
            `M` is number of detected objects.
    """

    def box_area(box):
        return (box[2] - box[0]) * (box[3] - box[1])

    area_true = box_area(boxes_true.T)
    area_detection = box_area(boxes_detection.T)

    top_left = np.maximum(boxes_true[:, None, :2], boxes_detection[:, :2])
    bottom_right = np.minimum(boxes_true[:, None, 2:], boxes_detection[:, 2:])

    area_inter = np.prod(np.clip(bottom_right - top_left, a_min=0, a_max=None), 2)
    ious = area_inter / (area_true[:, None] + area_detection - area_inter)
    ious = np.nan_to_num(ious)
    return ious

Compute Intersection over Union (IoU) of two sets of masks - masks_true and masks_detection.

Parameters:

Name Type Description Default
masks_true ndarray

3D np.ndarray representing ground-truth masks.

required
masks_detection ndarray

3D np.ndarray representing detection masks.

required
memory_limit int

memory limit in MB, default is 1024 * 5 MB (5GB).

1024 * 5

Returns:

Type Description
ndarray

np.ndarray: Pairwise IoU of masks from masks_true and masks_detection.

Source code in supervision/detection/utils.py
def mask_iou_batch(
    masks_true: np.ndarray,
    masks_detection: np.ndarray,
    memory_limit: int = 1024 * 5,
) -> np.ndarray:
    """
    Compute Intersection over Union (IoU) of two sets of masks -
        `masks_true` and `masks_detection`.

    Args:
        masks_true (np.ndarray): 3D `np.ndarray` representing ground-truth masks.
        masks_detection (np.ndarray): 3D `np.ndarray` representing detection masks.
        memory_limit (int): memory limit in MB, default is 1024 * 5 MB (5GB).

    Returns:
        np.ndarray: Pairwise IoU of masks from `masks_true` and `masks_detection`.
    """
    memory = (
        masks_true.shape[0]
        * masks_true.shape[1]
        * masks_true.shape[2]
        * masks_detection.shape[0]
        / 1024
        / 1024
    )
    if memory <= memory_limit:
        return _mask_iou_batch_split(masks_true, masks_detection)

    ious = []
    step = max(
        memory_limit
        * 1024
        * 1024
        // (
            masks_detection.shape[0]
            * masks_detection.shape[1]
            * masks_detection.shape[2]
        ),
        1,
    )
    for i in range(0, masks_true.shape[0], step):
        ious.append(_mask_iou_batch_split(masks_true[i : i + step], masks_detection))

    return np.vstack(ious)

Compute Intersection over Union (IoU) of two sets of oriented bounding boxes - boxes_true and boxes_detection. Both sets of boxes are expected to be in ((x1, y1), (x2, y2), (x3, y3), (x4, y4)) format.

Parameters:

Name Type Description Default
boxes_true ndarray

a np.ndarray representing ground-truth boxes. shape = (N, 4, 2) where N is number of true objects.

required
boxes_detection ndarray

a np.ndarray representing detection boxes. shape = (M, 4, 2) where M is number of detected objects.

required

Returns:

Type Description
ndarray

np.ndarray: Pairwise IoU of boxes from boxes_true and boxes_detection. shape = (N, M) where N is number of true objects and M is number of detected objects.

Source code in supervision/detection/utils.py
def oriented_box_iou_batch(
    boxes_true: np.ndarray, boxes_detection: np.ndarray
) -> np.ndarray:
    """
    Compute Intersection over Union (IoU) of two sets of oriented bounding boxes -
    `boxes_true` and `boxes_detection`. Both sets of boxes are expected to be in
    `((x1, y1), (x2, y2), (x3, y3), (x4, y4))` format.

    Args:
        boxes_true (np.ndarray): a `np.ndarray` representing ground-truth boxes.
            `shape = (N, 4, 2)` where `N` is number of true objects.
        boxes_detection (np.ndarray): a `np.ndarray` representing detection boxes.
            `shape = (M, 4, 2)` where `M` is number of detected objects.

    Returns:
        np.ndarray: Pairwise IoU of boxes from `boxes_true` and `boxes_detection`.
            `shape = (N, M)` where `N` is number of true objects and
            `M` is number of detected objects.
    """

    boxes_true = boxes_true.reshape(-1, 4, 2)
    boxes_detection = boxes_detection.reshape(-1, 4, 2)

    max_height = int(max(boxes_true[:, :, 0].max(), boxes_detection[:, :, 0].max()) + 1)
    # adding 1 because we are 0-indexed
    max_width = int(max(boxes_true[:, :, 1].max(), boxes_detection[:, :, 1].max()) + 1)

    mask_true = np.zeros((boxes_true.shape[0], max_height, max_width))
    for i, box_true in enumerate(boxes_true):
        mask_true[i] = polygon_to_mask(box_true, (max_width, max_height))

    mask_detection = np.zeros((boxes_detection.shape[0], max_height, max_width))
    for i, box_detection in enumerate(boxes_detection):
        mask_detection[i] = polygon_to_mask(box_detection, (max_width, max_height))

    ious = mask_iou_batch(mask_true, mask_detection)
    return ious

Generate a mask from a polygon.

Parameters:

Name Type Description Default
polygon ndarray

The polygon for which the mask should be generated, given as a list of vertices.

required
resolution_wh Tuple[int, int]

The width and height of the desired resolution.

required

Returns:

Type Description
ndarray

np.ndarray: The generated 2D mask, where the polygon is marked with 1's and the rest is filled with 0's.

Source code in supervision/detection/utils.py
def polygon_to_mask(polygon: np.ndarray, resolution_wh: Tuple[int, int]) -> np.ndarray:
    """Generate a mask from a polygon.

    Args:
        polygon (np.ndarray): The polygon for which the mask should be generated,
            given as a list of vertices.
        resolution_wh (Tuple[int, int]): The width and height of the desired resolution.

    Returns:
        np.ndarray: The generated 2D mask, where the polygon is marked with
            `1`'s and the rest is filled with `0`'s.
    """
    width, height = map(int, resolution_wh)
    mask = np.zeros((height, width), dtype=np.uint8)
    cv2.fillPoly(mask, [polygon.astype(np.int32)], color=1)
    return mask

Converts a 3D np.array of 2D bool masks into a 2D np.array of bounding boxes.

Parameters:

Name Type Description Default
masks ndarray

A 3D np.array of shape (N, W, H) containing 2D bool masks

required

Returns:

Type Description
ndarray

np.ndarray: A 2D np.array of shape (N, 4) containing the bounding boxes (x_min, y_min, x_max, y_max) for each mask

Source code in supervision/detection/utils.py
def mask_to_xyxy(masks: np.ndarray) -> np.ndarray:
    """
    Converts a 3D `np.array` of 2D bool masks into a 2D `np.array` of bounding boxes.

    Parameters:
        masks (np.ndarray): A 3D `np.array` of shape `(N, W, H)`
            containing 2D bool masks

    Returns:
        np.ndarray: A 2D `np.array` of shape `(N, 4)` containing the bounding boxes
            `(x_min, y_min, x_max, y_max)` for each mask
    """
    n = masks.shape[0]
    xyxy = np.zeros((n, 4), dtype=int)

    for i, mask in enumerate(masks):
        rows, cols = np.where(mask)

        if len(rows) > 0 and len(cols) > 0:
            x_min, x_max = np.min(cols), np.max(cols)
            y_min, y_max = np.min(rows), np.max(rows)
            xyxy[i, :] = [x_min, y_min, x_max, y_max]

    return xyxy

Converts a binary mask to a list of polygons.

Parameters:

Name Type Description Default
mask ndarray

A binary mask represented as a 2D NumPy array of shape (H, W), where H and W are the height and width of the mask, respectively.

required

Returns:

Type Description
List[ndarray]

List[np.ndarray]: A list of polygons, where each polygon is represented by a NumPy array of shape (N, 2), containing the x, y coordinates of the points. Polygons with fewer points than MIN_POLYGON_POINT_COUNT = 3 are excluded from the output.

Source code in supervision/detection/utils.py
def mask_to_polygons(mask: np.ndarray) -> List[np.ndarray]:
    """
    Converts a binary mask to a list of polygons.

    Parameters:
        mask (np.ndarray): A binary mask represented as a 2D NumPy array of
            shape `(H, W)`, where H and W are the height and width of
            the mask, respectively.

    Returns:
        List[np.ndarray]: A list of polygons, where each polygon is represented by a
            NumPy array of shape `(N, 2)`, containing the `x`, `y` coordinates
            of the points. Polygons with fewer points than `MIN_POLYGON_POINT_COUNT = 3`
            are excluded from the output.
    """

    contours, _ = cv2.findContours(
        mask.astype(np.uint8), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE
    )
    return [
        np.squeeze(contour, axis=1)
        for contour in contours
        if contour.shape[0] >= MIN_POLYGON_POINT_COUNT
    ]

Converts a polygon represented by a NumPy array into a bounding box.

Parameters:

Name Type Description Default
polygon ndarray

A polygon represented by a NumPy array of shape (N, 2), containing the x, y coordinates of the points.

required

Returns:

Type Description
ndarray

np.ndarray: A 1D NumPy array containing the bounding box (x_min, y_min, x_max, y_max) of the input polygon.

Source code in supervision/detection/utils.py
def polygon_to_xyxy(polygon: np.ndarray) -> np.ndarray:
    """
    Converts a polygon represented by a NumPy array into a bounding box.

    Parameters:
        polygon (np.ndarray): A polygon represented by a NumPy array of shape `(N, 2)`,
            containing the `x`, `y` coordinates of the points.

    Returns:
        np.ndarray: A 1D NumPy array containing the bounding box
            `(x_min, y_min, x_max, y_max)` of the input polygon.
    """
    x_min, y_min = np.min(polygon, axis=0)
    x_max, y_max = np.max(polygon, axis=0)
    return np.array([x_min, y_min, x_max, y_max])

Filters a list of polygons based on their area.

Parameters:

Name Type Description Default
polygons List[ndarray]

A list of polygons, where each polygon is represented by a NumPy array of shape (N, 2), containing the x, y coordinates of the points.

required
min_area Optional[float]

The minimum area threshold. Only polygons with an area greater than or equal to this value will be included in the output. If set to None, no minimum area constraint will be applied.

None
max_area Optional[float]

The maximum area threshold. Only polygons with an area less than or equal to this value will be included in the output. If set to None, no maximum area constraint will be applied.

None

Returns:

Type Description
List[ndarray]

List[np.ndarray]: A new list of polygons containing only those with areas within the specified thresholds.

Source code in supervision/detection/utils.py
def filter_polygons_by_area(
    polygons: List[np.ndarray],
    min_area: Optional[float] = None,
    max_area: Optional[float] = None,
) -> List[np.ndarray]:
    """
    Filters a list of polygons based on their area.

    Parameters:
        polygons (List[np.ndarray]): A list of polygons, where each polygon is
            represented by a NumPy array of shape `(N, 2)`,
            containing the `x`, `y` coordinates of the points.
        min_area (Optional[float]): The minimum area threshold.
            Only polygons with an area greater than or equal to this value
            will be included in the output. If set to None,
            no minimum area constraint will be applied.
        max_area (Optional[float]): The maximum area threshold.
            Only polygons with an area less than or equal to this value
            will be included in the output. If set to None,
            no maximum area constraint will be applied.

    Returns:
        List[np.ndarray]: A new list of polygons containing only those with
            areas within the specified thresholds.
    """
    if min_area is None and max_area is None:
        return polygons
    ares = [cv2.contourArea(polygon) for polygon in polygons]
    return [
        polygon
        for polygon, area in zip(polygons, ares)
        if (min_area is None or area >= min_area)
        and (max_area is None or area <= max_area)
    ]

Parameters:

Name Type Description Default
xyxy NDArray[float64]

An array of shape (n, 4) containing the bounding boxes coordinates in format [x1, y1, x2, y2]

required
offset array

An array of shape (2,) containing offset values in format is [dx, dy].

required

Returns:

Type Description
NDArray[float64]

npt.NDArray[np.float64]: Repositioned bounding boxes.

Examples:

import numpy as np
import supervision as sv

xyxy = np.array([
    [10, 10, 20, 20],
    [30, 30, 40, 40]
])
offset = np.array([5, 5])

sv.move_boxes(xyxy=xyxy, offset=offset)
# array([
#    [15, 15, 25, 25],
#    [35, 35, 45, 45]
# ])
Source code in supervision/detection/utils.py
def move_boxes(
    xyxy: npt.NDArray[np.float64], offset: npt.NDArray[np.int32]
) -> npt.NDArray[np.float64]:
    """
    Parameters:
        xyxy (npt.NDArray[np.float64]): An array of shape `(n, 4)` containing the
            bounding boxes coordinates in format `[x1, y1, x2, y2]`
        offset (np.array): An array of shape `(2,)` containing offset values in format
            is `[dx, dy]`.

    Returns:
        npt.NDArray[np.float64]: Repositioned bounding boxes.

    Examples:
        ```python
        import numpy as np
        import supervision as sv

        xyxy = np.array([
            [10, 10, 20, 20],
            [30, 30, 40, 40]
        ])
        offset = np.array([5, 5])

        sv.move_boxes(xyxy=xyxy, offset=offset)
        # array([
        #    [15, 15, 25, 25],
        #    [35, 35, 45, 45]
        # ])
        ```
    """
    return xyxy + np.hstack([offset, offset])

Offset the masks in an array by the specified (x, y) amount.

Parameters:

Name Type Description Default
masks NDArray[bool_]

A 3D array of binary masks corresponding to the predictions. Shape: (N, H, W), where N is the number of predictions, and H, W are the dimensions of each mask.

required
offset NDArray[int32]

An array of shape (2,) containing int values [dx, dy]. Supports both positive and negative values for bidirectional movement.

required
resolution_wh Tuple[int, int]

The width and height of the desired mask resolution.

required

Returns:

Type Description
NDArray[bool_]

(npt.NDArray[np.bool_]) repositioned masks, optionally padded to the specified shape.

Examples:

import numpy as np
import supervision as sv

mask = np.array([[[False, False, False, False],
                 [False, True,  True,  False],
                 [False, True,  True,  False],
                 [False, False, False, False]]], dtype=bool)

offset = np.array([1, 1])
sv.move_masks(mask, offset, resolution_wh=(4, 4))
# array([[[False, False, False, False],
#         [False, False, False, False],
#         [False, False,  True,  True],
#         [False, False,  True,  True]]], dtype=bool)

offset = np.array([-2, 2])
sv.move_masks(mask, offset, resolution_wh=(4, 4))
# array([[[False, False, False, False],
#         [False, False, False, False],
#         [False, False, False, False],
#         [True,  False, False, False]]], dtype=bool)
Source code in supervision/detection/utils.py
def move_masks(
    masks: npt.NDArray[np.bool_],
    offset: npt.NDArray[np.int32],
    resolution_wh: Tuple[int, int],
) -> npt.NDArray[np.bool_]:
    """
    Offset the masks in an array by the specified (x, y) amount.

    Args:
        masks (npt.NDArray[np.bool_]): A 3D array of binary masks corresponding to the
            predictions. Shape: `(N, H, W)`, where N is the number of predictions, and
            H, W are the dimensions of each mask.
        offset (npt.NDArray[np.int32]): An array of shape `(2,)` containing int values
            `[dx, dy]`. Supports both positive and negative values for bidirectional
            movement.
        resolution_wh (Tuple[int, int]): The width and height of the desired mask
            resolution.

    Returns:
        (npt.NDArray[np.bool_]) repositioned masks, optionally padded to the specified
            shape.

    Examples:
        ```python
        import numpy as np
        import supervision as sv

        mask = np.array([[[False, False, False, False],
                         [False, True,  True,  False],
                         [False, True,  True,  False],
                         [False, False, False, False]]], dtype=bool)

        offset = np.array([1, 1])
        sv.move_masks(mask, offset, resolution_wh=(4, 4))
        # array([[[False, False, False, False],
        #         [False, False, False, False],
        #         [False, False,  True,  True],
        #         [False, False,  True,  True]]], dtype=bool)

        offset = np.array([-2, 2])
        sv.move_masks(mask, offset, resolution_wh=(4, 4))
        # array([[[False, False, False, False],
        #         [False, False, False, False],
        #         [False, False, False, False],
        #         [True,  False, False, False]]], dtype=bool)
        ```
    """
    mask_array = np.full((masks.shape[0], resolution_wh[1], resolution_wh[0]), False)

    if offset[0] < 0:
        source_x_start = -offset[0]
        source_x_end = min(masks.shape[2], resolution_wh[0] - offset[0])
        destination_x_start = 0
        destination_x_end = min(resolution_wh[0], masks.shape[2] + offset[0])
    else:
        source_x_start = 0
        source_x_end = min(masks.shape[2], resolution_wh[0] - offset[0])
        destination_x_start = offset[0]
        destination_x_end = offset[0] + source_x_end - source_x_start

    if offset[1] < 0:
        source_y_start = -offset[1]
        source_y_end = min(masks.shape[1], resolution_wh[1] - offset[1])
        destination_y_start = 0
        destination_y_end = min(resolution_wh[1], masks.shape[1] + offset[1])
    else:
        source_y_start = 0
        source_y_end = min(masks.shape[1], resolution_wh[1] - offset[1])
        destination_y_start = offset[1]
        destination_y_end = offset[1] + source_y_end - source_y_start

    if source_x_end > source_x_start and source_y_end > source_y_start:
        mask_array[
            :,
            destination_y_start:destination_y_end,
            destination_x_start:destination_x_end,
        ] = masks[:, source_y_start:source_y_end, source_x_start:source_x_end]

    return mask_array

Scale the dimensions of bounding boxes.

Parameters:

Name Type Description Default
xyxy NDArray[float64]

An array of shape (n, 4) containing the bounding boxes coordinates in format [x1, y1, x2, y2]

required
factor float

A float value representing the factor by which the box dimensions are scaled. A factor greater than 1 enlarges the boxes, while a factor less than 1 shrinks them.

required

Returns:

Type Description
NDArray[float64]

npt.NDArray[np.float64]: Scaled bounding boxes.

Examples:

import numpy as np
import supervision as sv

xyxy = np.array([
    [10, 10, 20, 20],
    [30, 30, 40, 40]
])

sv.scale_boxes(xyxy=xyxy, factor=1.5)
# array([
#    [ 7.5,  7.5, 22.5, 22.5],
#    [27.5, 27.5, 42.5, 42.5]
# ])
Source code in supervision/detection/utils.py
def scale_boxes(
    xyxy: npt.NDArray[np.float64], factor: float
) -> npt.NDArray[np.float64]:
    """
    Scale the dimensions of bounding boxes.

    Parameters:
        xyxy (npt.NDArray[np.float64]): An array of shape `(n, 4)` containing the
            bounding boxes coordinates in format `[x1, y1, x2, y2]`
        factor (float): A float value representing the factor by which the box
            dimensions are scaled. A factor greater than 1 enlarges the boxes, while a
            factor less than 1 shrinks them.

    Returns:
        npt.NDArray[np.float64]: Scaled bounding boxes.

    Examples:
        ```python
        import numpy as np
        import supervision as sv

        xyxy = np.array([
            [10, 10, 20, 20],
            [30, 30, 40, 40]
        ])

        sv.scale_boxes(xyxy=xyxy, factor=1.5)
        # array([
        #    [ 7.5,  7.5, 22.5, 22.5],
        #    [27.5, 27.5, 42.5, 42.5]
        # ])
        ```
    """
    centers = (xyxy[:, :2] + xyxy[:, 2:]) / 2
    new_sizes = (xyxy[:, 2:] - xyxy[:, :2]) * factor
    return np.concatenate((centers - new_sizes / 2, centers + new_sizes / 2), axis=1)

Clips bounding boxes coordinates to fit within the frame resolution.

Parameters:

Name Type Description Default
xyxy ndarray

A numpy array of shape (N, 4) where each row corresponds to a bounding box in the format (x_min, y_min, x_max, y_max).

required
resolution_wh Tuple[int, int]

A tuple of the form (width, height) representing the resolution of the frame.

required

Returns:

Type Description
ndarray

np.ndarray: A numpy array of shape (N, 4) where each row corresponds to a bounding box with coordinates clipped to fit within the frame resolution.

Examples:

import numpy as np
import supervision as sv

xyxy = np.array([
    [10, 20, 300, 200],
    [15, 25, 350, 450],
    [-10, -20, 30, 40]
])

sv.clip_boxes(xyxy=xyxy, resolution_wh=(320, 240))
# array([
#     [ 10,  20, 300, 200],
#     [ 15,  25, 320, 240],
#     [  0,   0,  30,  40]
# ])
Source code in supervision/detection/utils.py
def clip_boxes(xyxy: np.ndarray, resolution_wh: Tuple[int, int]) -> np.ndarray:
    """
    Clips bounding boxes coordinates to fit within the frame resolution.

    Args:
        xyxy (np.ndarray): A numpy array of shape `(N, 4)` where each
            row corresponds to a bounding box in
            the format `(x_min, y_min, x_max, y_max)`.
        resolution_wh (Tuple[int, int]): A tuple of the form `(width, height)`
            representing the resolution of the frame.

    Returns:
        np.ndarray: A numpy array of shape `(N, 4)` where each row
            corresponds to a bounding box with coordinates clipped to fit
            within the frame resolution.

    Examples:
        ```python
        import numpy as np
        import supervision as sv

        xyxy = np.array([
            [10, 20, 300, 200],
            [15, 25, 350, 450],
            [-10, -20, 30, 40]
        ])

        sv.clip_boxes(xyxy=xyxy, resolution_wh=(320, 240))
        # array([
        #     [ 10,  20, 300, 200],
        #     [ 15,  25, 320, 240],
        #     [  0,   0,  30,  40]
        # ])
        ```
    """
    result = np.copy(xyxy)
    width, height = resolution_wh
    result[:, [0, 2]] = result[:, [0, 2]].clip(0, width)
    result[:, [1, 3]] = result[:, [1, 3]].clip(0, height)
    return result

Pads bounding boxes coordinates with a constant padding.

Parameters:

Name Type Description Default
xyxy ndarray

A numpy array of shape (N, 4) where each row corresponds to a bounding box in the format (x_min, y_min, x_max, y_max).

required
px int

The padding value to be added to both the left and right sides of each bounding box.

required
py Optional[int]

The padding value to be added to both the top and bottom sides of each bounding box. If not provided, px will be used for both dimensions.

None

Returns:

Type Description
ndarray

np.ndarray: A numpy array of shape (N, 4) where each row corresponds to a bounding box with coordinates padded according to the provided padding values.

Examples:

import numpy as np
import supervision as sv

xyxy = np.array([
    [10, 20, 30, 40],
    [15, 25, 35, 45]
])

sv.pad_boxes(xyxy=xyxy, px=5, py=10)
# array([
#     [ 5, 10, 35, 50],
#     [10, 15, 40, 55]
# ])
Source code in supervision/detection/utils.py
def pad_boxes(xyxy: np.ndarray, px: int, py: Optional[int] = None) -> np.ndarray:
    """
    Pads bounding boxes coordinates with a constant padding.

    Args:
        xyxy (np.ndarray): A numpy array of shape `(N, 4)` where each
            row corresponds to a bounding box in the format
            `(x_min, y_min, x_max, y_max)`.
        px (int): The padding value to be added to both the left and right sides of
            each bounding box.
        py (Optional[int]): The padding value to be added to both the top and bottom
            sides of each bounding box. If not provided, `px` will be used for both
            dimensions.

    Returns:
        np.ndarray: A numpy array of shape `(N, 4)` where each row corresponds to a
            bounding box with coordinates padded according to the provided padding
            values.

    Examples:
        ```python
        import numpy as np
        import supervision as sv

        xyxy = np.array([
            [10, 20, 30, 40],
            [15, 25, 35, 45]
        ])

        sv.pad_boxes(xyxy=xyxy, px=5, py=10)
        # array([
        #     [ 5, 10, 35, 50],
        #     [10, 15, 40, 55]
        # ])
        ```
    """
    if py is None:
        py = px

    result = xyxy.copy()
    result[:, [0, 1]] -= [px, py]
    result[:, [2, 3]] += [px, py]

    return result

Converts bounding box coordinates from (x, y, width, height) format to (x_min, y_min, x_max, y_max) format.

Parameters:

Name Type Description Default
xywh ndarray

A numpy array of shape (N, 4) where each row corresponds to a bounding box in the format (x, y, width, height).

required

Returns:

Type Description
ndarray

np.ndarray: A numpy array of shape (N, 4) where each row corresponds to a bounding box in the format (x_min, y_min, x_max, y_max).

Examples:

import numpy as np
import supervision as sv

xywh = np.array([
    [10, 20, 30, 40],
    [15, 25, 35, 45]
])

sv.xywh_to_xyxy(xywh=xywh)
# array([
#     [10, 20, 40, 60],
#     [15, 25, 50, 70]
# ])
Source code in supervision/detection/utils.py
def xywh_to_xyxy(xywh: np.ndarray) -> np.ndarray:
    """
    Converts bounding box coordinates from `(x, y, width, height)`
    format to `(x_min, y_min, x_max, y_max)` format.

    Args:
        xywh (np.ndarray): A numpy array of shape `(N, 4)` where each row
            corresponds to a bounding box in the format `(x, y, width, height)`.

    Returns:
        np.ndarray: A numpy array of shape `(N, 4)` where each row corresponds
            to a bounding box in the format `(x_min, y_min, x_max, y_max)`.

    Examples:
        ```python
        import numpy as np
        import supervision as sv

        xywh = np.array([
            [10, 20, 30, 40],
            [15, 25, 35, 45]
        ])

        sv.xywh_to_xyxy(xywh=xywh)
        # array([
        #     [10, 20, 40, 60],
        #     [15, 25, 50, 70]
        # ])
        ```
    """
    xyxy = xywh.copy()
    xyxy[:, 2] = xywh[:, 0] + xywh[:, 2]
    xyxy[:, 3] = xywh[:, 1] + xywh[:, 3]
    return xyxy

Converts bounding box coordinates from (center_x, center_y, width, height) format to (x_min, y_min, x_max, y_max) format.

Parameters:

Name Type Description Default
xcycwh ndarray

A numpy array of shape (N, 4) where each row corresponds to a bounding box in the format (center_x, center_y, width, height).

required

Returns:

Type Description
ndarray

np.ndarray: A numpy array of shape (N, 4) where each row corresponds to a bounding box in the format (x_min, y_min, x_max, y_max).

Examples:

import numpy as np
import supervision as sv

xcycwh = np.array([
    [50, 50, 20, 30],
    [30, 40, 10, 15]
])

sv.xcycwh_to_xyxy(xcycwh=xcycwh)
# array([
#     [40, 35, 60, 65],
#     [25, 32.5, 35, 47.5]
# ])
Source code in supervision/detection/utils.py
def xcycwh_to_xyxy(xcycwh: np.ndarray) -> np.ndarray:
    """
    Converts bounding box coordinates from `(center_x, center_y, width, height)`
    format to `(x_min, y_min, x_max, y_max)` format.

    Args:
        xcycwh (np.ndarray): A numpy array of shape `(N, 4)` where each row
            corresponds to a bounding box in the format `(center_x, center_y, width,
            height)`.

    Returns:
        np.ndarray: A numpy array of shape `(N, 4)` where each row corresponds
            to a bounding box in the format `(x_min, y_min, x_max, y_max)`.

    Examples:
        ```python
        import numpy as np
        import supervision as sv

        xcycwh = np.array([
            [50, 50, 20, 30],
            [30, 40, 10, 15]
        ])

        sv.xcycwh_to_xyxy(xcycwh=xcycwh)
        # array([
        #     [40, 35, 60, 65],
        #     [25, 32.5, 35, 47.5]
        # ])
        ```
    """
    xyxy = xcycwh.copy()
    xyxy[:, 0] = xcycwh[:, 0] - xcycwh[:, 2] / 2
    xyxy[:, 1] = xcycwh[:, 1] - xcycwh[:, 3] / 2
    xyxy[:, 2] = xcycwh[:, 0] + xcycwh[:, 2] / 2
    xyxy[:, 3] = xcycwh[:, 1] + xcycwh[:, 3] / 2
    return xyxy

Checks if the binary mask contains holes (background pixels fully enclosed by foreground pixels).

Parameters:

Name Type Description Default
mask NDArray[bool_]

2D binary mask where True indicates foreground object and False indicates background.

required

Returns:

Type Description
bool

True if holes are detected, False otherwise.

Examples:

import numpy as np
import supervision as sv

mask = np.array([
    [0, 0, 0, 0, 0],
    [0, 1, 1, 1, 0],
    [0, 1, 0, 1, 0],
    [0, 1, 1, 1, 0],
    [0, 0, 0, 0, 0]
]).astype(bool)

sv.contains_holes(mask=mask)
# True

mask = np.array([
    [0, 0, 0, 0, 0],
    [0, 1, 1, 1, 0],
    [0, 1, 1, 1, 0],
    [0, 1, 1, 1, 0],
    [0, 0, 0, 0, 0]
]).astype(bool)

sv.contains_holes(mask=mask)
# False

contains_holes

Source code in supervision/detection/utils.py
def contains_holes(mask: npt.NDArray[np.bool_]) -> bool:
    """
    Checks if the binary mask contains holes (background pixels fully enclosed by
    foreground pixels).

    Args:
        mask (npt.NDArray[np.bool_]): 2D binary mask where `True` indicates foreground
            object and `False` indicates background.

    Returns:
        True if holes are detected, False otherwise.

    Examples:
        ```python
        import numpy as np
        import supervision as sv

        mask = np.array([
            [0, 0, 0, 0, 0],
            [0, 1, 1, 1, 0],
            [0, 1, 0, 1, 0],
            [0, 1, 1, 1, 0],
            [0, 0, 0, 0, 0]
        ]).astype(bool)

        sv.contains_holes(mask=mask)
        # True

        mask = np.array([
            [0, 0, 0, 0, 0],
            [0, 1, 1, 1, 0],
            [0, 1, 1, 1, 0],
            [0, 1, 1, 1, 0],
            [0, 0, 0, 0, 0]
        ]).astype(bool)

        sv.contains_holes(mask=mask)
        # False
        ```

    ![contains_holes](https://media.roboflow.com/supervision-docs/contains-holes.png){ align=center width="800" }
    """  # noqa E501 // docs
    mask_uint8 = mask.astype(np.uint8)
    _, hierarchy = cv2.findContours(mask_uint8, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)

    if hierarchy is not None:
        parent_contour_index = 3
        for h in hierarchy[0]:
            if h[parent_contour_index] != -1:
                return True
    return False

Checks if the binary mask contains multiple unconnected foreground segments.

Parameters:

Name Type Description Default
mask NDArray[bool_]

2D binary mask where True indicates foreground object and False indicates background.

required
connectivity int)

Default: 4 is 4-way connectivity, which means that foreground pixels are the part of the same segment/component if their edges touch. Alternatively: 8 for 8-way connectivity, when foreground pixels are connected by their edges or corners touch.

4

Returns:

Type Description
bool

True when the mask contains multiple not connected components, False otherwise.

Raises:

Type Description
ValueError

If connectivity(int) parameter value is not 4 or 8.

Examples:

import numpy as np
import supervision as sv

mask = np.array([
    [0, 0, 0, 0, 0, 0],
    [0, 1, 1, 0, 1, 1],
    [0, 1, 1, 0, 1, 1],
    [0, 0, 0, 0, 0, 0],
    [0, 1, 1, 1, 0, 0],
    [0, 1, 1, 1, 0, 0]
]).astype(bool)

sv.contains_multiple_segments(mask=mask, connectivity=4)
# True

mask = np.array([
    [0, 0, 0, 0, 0, 0],
    [0, 1, 1, 1, 1, 1],
    [0, 1, 1, 1, 1, 1],
    [0, 1, 1, 1, 1, 1],
    [0, 1, 1, 1, 1, 1],
    [0, 0, 0, 0, 0, 0]
]).astype(bool)

sv.contains_multiple_segments(mask=mask, connectivity=4)
# False

contains_multiple_segments

Source code in supervision/detection/utils.py
def contains_multiple_segments(
    mask: npt.NDArray[np.bool_], connectivity: int = 4
) -> bool:
    """
    Checks if the binary mask contains multiple unconnected foreground segments.

    Args:
        mask (npt.NDArray[np.bool_]): 2D binary mask where `True` indicates foreground
            object and `False` indicates background.
        connectivity (int) : Default: 4 is 4-way connectivity, which means that
            foreground pixels are the part of the same segment/component
            if their edges touch.
            Alternatively: 8 for 8-way connectivity, when foreground pixels are
            connected by their edges or corners touch.

    Returns:
        True when the mask contains multiple not connected components, False otherwise.

    Raises:
        ValueError: If connectivity(int) parameter value is not 4 or 8.

    Examples:
        ```python
        import numpy as np
        import supervision as sv

        mask = np.array([
            [0, 0, 0, 0, 0, 0],
            [0, 1, 1, 0, 1, 1],
            [0, 1, 1, 0, 1, 1],
            [0, 0, 0, 0, 0, 0],
            [0, 1, 1, 1, 0, 0],
            [0, 1, 1, 1, 0, 0]
        ]).astype(bool)

        sv.contains_multiple_segments(mask=mask, connectivity=4)
        # True

        mask = np.array([
            [0, 0, 0, 0, 0, 0],
            [0, 1, 1, 1, 1, 1],
            [0, 1, 1, 1, 1, 1],
            [0, 1, 1, 1, 1, 1],
            [0, 1, 1, 1, 1, 1],
            [0, 0, 0, 0, 0, 0]
        ]).astype(bool)

        sv.contains_multiple_segments(mask=mask, connectivity=4)
        # False
        ```

    ![contains_multiple_segments](https://media.roboflow.com/supervision-docs/contains-multiple-segments.png){ align=center width="800" }
    """  # noqa E501 // docs
    if connectivity != 4 and connectivity != 8:
        raise ValueError(
            "Incorrect connectivity value. Possible connectivity values: 4 or 8."
        )
    mask_uint8 = mask.astype(np.uint8)
    labels = np.zeros_like(mask_uint8, dtype=np.int32)
    number_of_labels, _ = cv2.connectedComponents(
        mask_uint8, labels, connectivity=connectivity
    )
    return number_of_labels > 2

Comments