Annotate Video with Detections¶
One of the most common requirements of computer vision applications is detecting objects in images and displaying bounding boxes around those objects. In this cookbook we'll walk through the steps on how to utilize the open source Roboflow ecosystem to accomplish this task on a video. Let's dive in!
Before you start¶
Let's make sure that we have access to GPU. We can use nvidia-smi
command to do that. In case of any problems navigate to Edit
-> Notebook settings
-> Hardware accelerator
, set it to GPU
, and then click Save
.
!nvidia-smi
Fri Feb 23 03:15:00 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla V100-SXM2-16GB Off | 00000000:00:04.0 Off | 0 | | N/A 33C P0 24W / 300W | 0MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+
Installing Dependencies¶
In this cookbook we'll be utilizing the open source packages Inference and Supervision to accomplish our goals. Let's get those installed in our notebook with pip.
!pip install -q inference-gpu "supervision[assets]"
Download a Video Asset¶
First, let's download a video that we can detect objects in. Supervision comes with a great utility called Assets to help us hit the ground running. When we run this script, the video is saved in our local directory and can be accessed with the variable path_to_video
.
from supervision.assets import download_assets, VideoAssets
# Download a supervision video asset
path_to_video = download_assets(VideoAssets.PEOPLE_WALKING)
As a result, we've downloaded a video. Let's take a look at the video below. Keep in mind that the video preview below works only in the web version of the cookbooks and not in Google Colab.
Detecting Objects¶
For this example, the objects in the video that we'd like to detect are people. In order to display bounding boxes around the people in the video, we first need a way to detect them. We'll be using the open source Inference package for this task. Inference allows us to quickly use thousands of models, including fine tuned models from Roboflow Universe, with a few lines of code. We'll also utilize a few utilities for working with our video data from the Supervision package.
import supervision as sv
from supervision.assets import download_assets, VideoAssets
from inference.models.utils import get_roboflow_model
# Load a yolov8 model from roboflow.
model = get_roboflow_model("yolov8s-640")
# Create a frame generator and video info object from supervision utilities.
frame_generator = sv.get_video_frames_generator(path_to_video)
# Yield a single frame from the generator.
frame = next(frame_generator)
# Run inference on our frame
result = model.infer(frame)[0]
# Parse result into detections data model.
detections = sv.Detections.from_inference(result)
# Pretty Print the resulting detections.
from pprint import pprint
pprint(detections)
Detections(xyxy=array([[1140., 951., 1245., 1079.], [ 666., 648., 745., 854.], [ 34., 794., 142., 990.], [1140., 505., 1211., 657.], [ 260., 438., 332., 612.], [1413., 702., 1523., 887.], [1462., 472., 1543., 643.], [1446., 318., 1516., 483.], [ 753., 451., 821., 623.], [ 924., 172., 983., 307.], [1791., 144., 1852., 275.], [ 93., 132., 146., 251.], [ 708., 240., 765., 388.], [ 200., 44., 267., 161.], [1204., 131., 1255., 266.], [ 569., 267., 628., 408.], [1163., 150., 1210., 280.], [ 799., 78., 847., 204.], [1690., 152., 1751., 283.], [ 344., 495., 396., 641.], [1722., 77., 1782., 178.]]), mask=None, confidence=array([0.83215541, 0.80572134, 0.7919845 , 0.7912274 , 0.77121079, 0.7599591 , 0.75711554, 0.75494027, 0.73076195, 0.71452248, 0.69572842, 0.65269446, 0.63952065, 0.62914598, 0.61361706, 0.5968492 , 0.55311316, 0.5470854 , 0.54070991, 0.52209878, 0.41217673]), class_id=array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tracker_id=None, data={'class_name': array(['person', 'person', 'person', 'person', 'person', 'person', 'person', 'person', 'person', 'person', 'person', 'person', 'person', 'person', 'person', 'person', 'person', 'person', 'person', 'person', 'person'], dtype='<U6')})
First, we load our model using the method get_roboflow_model()
. Notice how we pass in a model_id
? We're using an alias here. This is where we can pass in other models from Roboflow Universe like this rock, paper, scissors model utilizing our roboflow api key.
model = get_roboflow_mode(
model_id="rock-paper-scissors-sxsw/11",
api_key="roboflow_private_api_key"
)
If you don't have an api key, you can create an free Roboflow account. This model wouldn't be much help with detecting people, but it's a nice exercise to see how our code becomes model agnostic!
We then create a frame_generator
object and yeild a single frame for inference using next()
. We pass our frame to model.infer()
to run inference. After, we pass that data into a little helpfer function called sv.Detections.from_inference()
to parse it. Lastly we print our detections to show we are in fact detecting a few people in the frame!
Annotaing the Frame with Bounding Boxes¶
Now that we're detecting images, let's get to the fun part. Let's annotate the frame and display the bounding boxes on the frame.
# Create a bounding box annotator object.
bounding_box = sv.BoundingBoxAnnotator()
# Annotate our frame with detections.
annotated_frame = bounding_box.annotate(scene=frame.copy(), detections=detections)
# Display the frame.
sv.plot_image(annotated_frame)