Serialise Detections to a CSV File¶

This cookbook introduce sv.CSVSink tool designed to write captured object detection data to file from video streams/file

Click the Open in Colab button to run the cookbook on Google Colab.

In [ ]:

Copied!

!pip install -q inference requests tqdm supervision==0.21.0
!pip install -q inference requests tqdm supervision==0.21.0

In [5]:

Copied!





import csv
from typing import List
from collections import defaultdict

import numpy as np
import pandas as pd

import supervision as sv
from supervision.assets import download_assets, VideoAssets
from inference import InferencePipeline
from inference.core.interfaces.camera.entities import VideoFrame
import csv
from typing import List
from collections import defaultdict

import numpy as np
import pandas as pd

import supervision as sv
from supervision.assets import download_assets, VideoAssets
from inference import InferencePipeline
from inference.core.interfaces.camera.entities import VideoFrame

The parameters defined below are:

SOURCE_VIDEO_PATH - the path to the input video
CONFIDENCE_THRESHOLD - do not include detections below this confidence level
IOU_THRESHOLD - discard detections that overlap with others by more than this IOU ratio
FILE_NAME - write the json output to this file
INFERENCE_MODEL - model id. This cookbook uses a model alias, but it can also be a fine-tuned model or a model from the Universe.

In [6]:

Copied!





SOURCE_VIDEO_PATH = download_assets(VideoAssets.PEOPLE_WALKING)
CONFIDENCE_THRESHOLD = 0.3
IOU_THRESHOLD = 0.7
FILE_NAME = "detections.csv"
INFERENCE_MODEL = "yolov8n-640"
SOURCE_VIDEO_PATH = download_assets(VideoAssets.PEOPLE_WALKING)
CONFIDENCE_THRESHOLD = 0.3
IOU_THRESHOLD = 0.7
FILE_NAME = "detections.csv"
INFERENCE_MODEL = "yolov8n-640"

people-walking.mp4 asset download complete.

As a result of executing the above download_assets(VideoAssets.PEOPLE_WALKING) , you will download a video file and save it at the SOURCE_VIDEO_PATH. Keep in mind that the video preview below works only in the web version of the cookbooks and not in Google Colab.

Read single frame from video¶

The get_video_frames_generator enables us to easily iterate over video frames. Let's create a video generator for our sample input file and display its first frame on the screen.

In [7]:

Copied!

generator = sv.get_video_frames_generator(SOURCE_VIDEO_PATH)
frame = next(generator)

sv.plot_image(frame, (12, 12))
generator = sv.get_video_frames_generator(SOURCE_VIDEO_PATH)
frame = next(generator)

sv.plot_image(frame, (12, 12))

No description has been provided for this image

Initialize ByteTrack¶

ByteTrack is a multi-object tracking algorithm used by Supervision to track and link detected objects across multiple frames, providing consistent IDs for each object.Initialize the object.

In [8]:

Copied!

byte_track = sv.ByteTrack(minimum_consecutive_frames=3)
byte_track.reset()
byte_track = sv.ByteTrack(minimum_consecutive_frames=3)
byte_track.reset()

Initialize CSVSink¶

To save detections to a CSV file, open our sv.CSVSink and then pass the sv.Detections object resulting from the inference to it.

Note that empty detections will be skipped.

In [32]:

Copied!

csv_sink = sv.CSVSink(FILE_NAME)
csv_sink.open()
csv_sink = sv.CSVSink(FILE_NAME)
csv_sink.open()

Process video and save detections to csv file¶

The InferencePipeline interface is made for streaming and is likely the best route to go for real time use cases. It is an asynchronous interface that can consume many different video sources including local devices (like webcams), RTSP video streams, video files, etc. With this interface, you define the source of a video stream and sinks.

All the operations we plan to perform for each frame of our video - detection, tracking, annotation, and write to csv are encapsulated in a function named callback.

In [33]:

Copied!





def callback(predictions: dict, frame: VideoFrame) -> np.ndarray:
    detections = sv.Detections.from_inference(predictions)

    # Only keep person detections
    detections = detections[detections.class_id == 0]
    detections.data["class_name"] = np.array(["person" for _ in range(len(detections))])

    detections = byte_track.update_with_detections(detections)
    csv_sink.append(detections, custom_data={'frame_number': frame.frame_id})
def callback(predictions: dict, frame: VideoFrame) -> np.ndarray:
    detections = sv.Detections.from_inference(predictions)

    # Only keep person detections
    detections = detections[detections.class_id == 0]
    detections.data["class_name"] = np.array(["person" for _ in range(len(detections))])

    detections = byte_track.update_with_detections(detections)
    csv_sink.append(detections, custom_data={'frame_number': frame.frame_id})

In [34]:

Copied!





pipeline = InferencePipeline.init(
    model_id=INFERENCE_MODEL,
    video_reference=SOURCE_VIDEO_PATH,
    on_prediction=callback,
    iou_threshold=IOU_THRESHOLD,
    confidence=CONFIDENCE_THRESHOLD,
)
pipeline = InferencePipeline.init(
    model_id=INFERENCE_MODEL,
    video_reference=SOURCE_VIDEO_PATH,
    on_prediction=callback,
    iou_threshold=IOU_THRESHOLD,
    confidence=CONFIDENCE_THRESHOLD,
)

UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'CPUExecutionProvider'
UserWarning: Specified provider 'OpenVINOExecutionProvider' is not in available provider names.Available providers: 'CPUExecutionProvider'

In [35]:

Copied!

pipeline.start()
pipeline.join()
pipeline.start()
pipeline.join()

In [36]:

Copied!

csv_sink.close()
csv_sink.close()

Visualizate results of detections csv data with Pandas¶

Let's take a look at our resulting data with by using Pandas.

It will also be created in your current directory with the name detections.csv as well.

In [37]:

Copied!

df = pd.read_csv(FILE_NAME)
df
df = pd.read_csv(FILE_NAME)
df

Out[37]:

	x_min	y_min	x_max	y_max	class_id	confidence	tracker_id	class_name	frame_number
0	1142.0	950.0	1245.0	1080.0	0	0.767850	185	person	1
1	750.0	450.0	823.0	620.0	0	0.748268	168	person	1
2	1419.0	702.0	1526.0	887.0	0	0.550286	179	person	1
3	1674.0	12.0	1727.0	141.0	0	0.546864	174	person	1
4	1616.0	18.0	1655.0	127.0	0	0.520566	181	person	1
...	...	...	...	...	...	...	...	...	...
8757	394.0	471.0	460.0	633.0	0	0.466400	244	person	341
8758	115.0	256.0	199.0	399.0	0	0.431374	223	person	341
8759	77.0	700.0	169.0	892.0	0	0.416318	240	person	341
8760	1010.0	59.0	1055.0	174.0	0	0.403422	246	person	341
8761	1758.0	0.0	1809.0	96.0	0	0.313793	236	person	341

8762 rows × 9 columns

Convert CSV data to sv.Detections¶

In [38]:

Copied!





def csv_to_detections(csv_file: str) -> List[sv.Detections]:
    rows_by_frame_number = defaultdict(list)
    with open(csv_file, 'r') as f:
        reader = csv.DictReader(f)
        for row in reader:
            frame_number = int(row["frame_number"])
            rows_by_frame_number[frame_number].append(row)

    detections_list = []
    for frame_number, rows in rows_by_frame_number.items():
        xyxy = []
        class_id = []
        confidence = []
        tracker_id = []
        custom_data = defaultdict(list)

        for row in rows:
            xyxy.append([row[key] for key in ["x_min", "y_min", "x_max", "y_max"]])
            class_id.append(row["class_id"])
            confidence.append(row["confidence"])
            tracker_id.append(row["tracker_id"])

            for custom_key in row.keys():
                if custom_key in ["x_min", "y_min", "x_max", "y_max", "class_id", "confidence", "tracker_id"]:
                    continue
                custom_data[custom_key].append(row[custom_key])

        if all([val == "" for val in class_id]):
            class_id = None
        if all([val == "" for val in confidence]):
            confidence = None
        if all([val == "" for val in tracker_id]):
            tracker_id = None

        detections_list.append(
            sv.Detections(
                xyxy=np.array(xyxy, dtype=np.float32),
                class_id=np.array(class_id, dtype=int),
                confidence=np.array(confidence, dtype=np.float32),
                tracker_id=np.array(tracker_id, dtype=int),
                data=dict(custom_data)
            )
        )
    
    return detections_list
def csv_to_detections(csv_file: str) -> List[sv.Detections]:
    rows_by_frame_number = defaultdict(list)
    with open(csv_file, 'r') as f:
        reader = csv.DictReader(f)
        for row in reader:
            frame_number = int(row["frame_number"])
            rows_by_frame_number[frame_number].append(row)

    detections_list = []
    for frame_number, rows in rows_by_frame_number.items():
        xyxy = []
        class_id = []
        confidence = []
        tracker_id = []
        custom_data = defaultdict(list)

        for row in rows:
            xyxy.append([row[key] for key in ["x_min", "y_min", "x_max", "y_max"]])
            class_id.append(row["class_id"])
            confidence.append(row["confidence"])
            tracker_id.append(row["tracker_id"])

            for custom_key in row.keys():
                if custom_key in ["x_min", "y_min", "x_max", "y_max", "class_id", "confidence", "tracker_id"]:
                    continue
                custom_data[custom_key].append(row[custom_key])

        if all([val == "" for val in class_id]):
            class_id = None
        if all([val == "" for val in confidence]):
            confidence = None
        if all([val == "" for val in tracker_id]):
            tracker_id = None

        detections_list.append(
            sv.Detections(
                xyxy=np.array(xyxy, dtype=np.float32),
                class_id=np.array(class_id, dtype=int),
                confidence=np.array(confidence, dtype=np.float32),
                tracker_id=np.array(tracker_id, dtype=int),
                data=dict(custom_data)
            )
        )
    
    return detections_list

In [39]:

Copied!

detections_list = csv_to_detections(FILE_NAME)
detections_list

print(f"Detections: {len(detections_list)}")
print(detections_list[0])
detections_list = csv_to_detections(FILE_NAME)
detections_list

print(f"Detections: {len(detections_list)}")
print(detections_list[0])

Detections: 341
Detections(xyxy=array([[1142.,  950., 1245., 1080.],
       [ 750.,  450.,  823.,  620.],
       [1419.,  702., 1526.,  887.],
       [1674.,   12., 1727.,  141.],
       [1616.,   18., 1655.,  127.],
       [1388.,    2., 1437.,  184.]], dtype=float32), mask=None, confidence=array([0.7678498 , 0.7482683 , 0.5502863 , 0.5468636 , 0.5205659 ,
       0.31684005], dtype=float32), class_id=array([0, 0, 0, 0, 0, 0]), tracker_id=array([185, 168, 179, 174, 181,  28]), data={'class_name': ['person', 'person', 'person', 'person', 'person', 'person'], 'frame_number': ['1', '1', '1', '1', '1', '1']})

Annotate First Frame¶

Visualize the first frame of a video alongside the initial detections obtained by parsing CSV data into sv.Detections objects. The annotated image will show the original video frame, marked with the first bounding box detected from the parsed data, providing a visual representation of the identified object(s) in the scene.

Get back `sv.Detections`¶

In [40]:

Copied!

FRAME_NUMBER = 100

detections = detections_list[FRAME_NUMBER]
frame_number = detections.data["frame_number"][0]

generator = sv.get_video_frames_generator(SOURCE_VIDEO_PATH, start=FRAME_NUMBER)
frame = next(generator)
FRAME_NUMBER = 100

detections = detections_list[FRAME_NUMBER]
frame_number = detections.data["frame_number"][0]

generator = sv.get_video_frames_generator(SOURCE_VIDEO_PATH, start=FRAME_NUMBER)
frame = next(generator)

Annotate Image with Detections¶

Finally, we can annotate the image with the predictions. Since we are working with an object detection model, we will use the sv.BoundingBoxAnnotator and sv.LabelAnnotator classes.

In [41]:

Copied!





bounding_box_annotator = sv.BoundingBoxAnnotator()
label_annotator = sv.LabelAnnotator()

annotated_frame = frame.copy()
annotated_frame = bounding_box_annotator.annotate(scene=annotated_frame, detections=detections)
annotated_frame = label_annotator.annotate(scene=annotated_frame, detections=detections)
sv.plot_image(annotated_frame, (12, 12))
bounding_box_annotator = sv.BoundingBoxAnnotator()
label_annotator = sv.LabelAnnotator()

annotated_frame = frame.copy()
annotated_frame = bounding_box_annotator.annotate(scene=annotated_frame, detections=detections)
annotated_frame = label_annotator.annotate(scene=annotated_frame, detections=detections)
sv.plot_image(annotated_frame, (12, 12))

References 📚¶

Supervision: https://supervision.roboflow.com
sv.Detections: https://supervision.roboflow.com/develop/detection/core/#detections
Save Detections to CSV: https://supervision.roboflow.com/develop/how_to/save_detections/#save-detections-as-csv
Custom fields: https://supervision.roboflow.com/develop/how_to/save_detections/#custom-fields
ByteTrack: https://supervision.roboflow.com/trackers/#supervision.tracker.byte_tracker.core.ByteTrack
Inference: https://inference.roboflow.com/
Inference Pipeline: https://inference.roboflow.com/using_inference/inference_pipeline/
Inference Aliases: https://inference.roboflow.com/quickstart/aliases/