With Supervision, you can load and manipulate classification, object detection, and
segmentation datasets. This tutorial will walk you through how to load, split, merge,
visualize, and augment datasets in Supervision.
In this tutorial, we will use a dataset from
Roboflow Universe, a public repository of
thousands of computer vision datasets. If you already have your dataset in
COCO,
YOLO,
or Pascal VOC format, you can skip this
section.
pipinstallroboflow
Next, log into your Roboflow account and download the dataset of your choice in the
COCO, YOLO, or Pascal VOC format. You can customize the following code snippet with
your workspace ID, project ID, and version number.
The Supervision library provides convenient functions to load datasets in various
formats. If your dataset is already split into train, test, and valid subsets, you can
load each of those as separate sv.DetectionDataset
instances.
If your dataset is not already split into train, test, and valid subsets, you can
easily do so using the sv.DetectionDataset.split
method. We can split it as follows, ensuring a random shuffle of the data.
There are two ways to loop over a sv.DetectionDataset: using a direct
for loop
called on the sv.DetectionDataset instance or loading sv.DetectionDataset entries
by index.
importsupervisionassvds=sv.DetectionDataset(...)# Option 1forimage_path,image,annotationsinds:...# Process each image and its annotations# Option 2foridxinrange(len(ds)):image_path,image,annotations=ds[idx]...# Process the image and annotations at index `idx`
The Supervision library provides tools for easily visualizing your detection dataset.
You can create a grid of annotated images to quickly inspect your data and labels.
First, initialize the sv.BoxAnnotator
and sv.LabelAnnotator.
Then, iterate through a subset of the dataset (e.g., the first 25 images), drawing
bounding boxes and class labels on each image. Finally, combine the annotated images
into a grid for display.
In this section, we'll explore using Supervision in combination with Albumentations to
augment our dataset. Data augmentation is a common technique in computer vision to
increase the size and diversity of training datasets, leading to improved model
performance and generalization.
pipinstallaugmentation
Albumentations provides a flexible and powerful API for image augmentation. The core of
the library is the Compose
class, which allows you to chain multiple image transformations together. Each
transformation is defined using a dedicated class, such as
HorizontalFlip,
RandomBrightnessContrast,
or Perspective.