Object Tracking¶
In some cases, it's important for us to track objects across multiple frames of a video. For example, we may need to figure out the direction a vehicle is moving, or count objects in a frame. Some Supervision Annotators and Tools like LineZone require tracking to be setup. In this cookbook, we'll cover how to get a tracker up and running for use in your computer vision applications.
What is a Tracker?¶
Trackers are a piece of code that identifies objects across frames and assigns them a unique tracker_id. There are a few popular trackers at the time of writing this including ByteTrack and Bot-SORT. Supervision makes using trackers a breeze and comes with ByteTrack built-in.
Before you start¶
Let's make sure that we have access to GPU. We can use nvidia-smi command to do that. In case of any problems navigate to Edit -> Notebook settings -> Hardware accelerator, set it to GPU, and then click Save.
!nvidia-smi
Fri Feb 23 03:18:02 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-SXM2-16GB Off | 00000000:00:04.0 Off | 0 |
| N/A 33C P0 24W / 300W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Install Dependencies¶
!pip install -q inference-gpu "supervision"
Download a Video Asset¶
Now that we have our environment setup, lets download a video that we can detect objects in. Supervision comes with a great utility to help us hit the ground running. We can use the below snippet to he video is save a video asset in our local directory. It can also be accessed with the variable path_to_video for additional application logic.
from supervision.assets import download_assets, VideoAssets
# Download a supervision video asset
path_to_video = download_assets(VideoAssets.PEOPLE_WALKING)
Tracking Objects in a Frame¶
Now that we have our video installed, let's get to work on tracking objects. We'll first pull in a model from roboflow Inference to detect people in our video. Then let's create a byte_tracker object that we'll pass our detections to. This will give us a tracker_id. We'll then utilize that tracker id to label our detections with a label_annotator to display the tracker id.
import supervision as sv
from inference.models.utils import get_roboflow_model
# Load a pre trained yolov8 nano model from Roboflow Inference.
model = get_roboflow_model('yolov8n-640')
# Create a video info object from the video path.
video_info = sv.VideoInfo.from_video_path(path_to_video)
# Create a label annotator for labeling detections with our tracker_id.
label = sv.LabelAnnotator()
# Create a ByteTrack object to track detections.
byte_tracker = sv.ByteTrack(frame_rate=video_info.fps)
# Create a frame generator from video path for iteration of frames.
frame_generator = sv.get_video_frames_generator(path_to_video)
# Grab a frame from the frame_generator.
frame = next(frame_generator)
# Run inference on the frame by passing it to our model.
result = model.infer(frame)[0]
# Convert model results to a supervision detection object.
detections = sv.Detections.from_inference(result)
# Update detections with tracker ids fro byte_tracker.
tracked_detections = byte_tracker.update_with_detections(detections)
# Create labels with tracker_id for label annotator.
labels = [ f"{tracker_id}" for tracker_id in tracked_detections.tracker_id ]
# Apply label annotator to frame.
annotated_frame = label.annotate(scene=frame.copy(), detections=tracked_detections, labels=labels)
# Display the frame.
sv.plot_image(annotated_frame)
Tracking Objects in a Video¶
Finally, we'll use a utility called VideoSink to save the annotated frames to a video. Let's dive in to the code.
from tqdm import tqdm
# Load a pre trained yolov8 nano model from Roboflow Inference.
model = get_roboflow_model('yolov8n-640')
# Create a video info object from the video path.
video_info = sv.VideoInfo.from_video_path(path_to_video)
# Create a label annotator for labeling detections with our tracker_id.
label = sv.LabelAnnotator()
# Create a ByteTrack object to track detections.
byte_tracker = sv.ByteTrack(frame_rate=video_info.fps)
# Create a frame generator from video path for iteration of frames.
frame_generator = sv.get_video_frames_generator(path_to_video)
# Create a video sink context manager to save resulting video.
with sv.VideoSink(target_path="output.mp4", video_info=video_info) as sink:
# Iterate through frames yielded from the frame_generator.
for frame in tqdm(frame_generator, total=video_info.total_frames):
# Run inference on the frame by passing it to our model.
result = model.infer(frame)[0]
# Convert model results to a supervision detection object.
detections = sv.Detections.from_inference(result)
# Update detections with tracker ids fro byte_tracker.
tracked_detections = byte_tracker.update_with_detections(detections)
# Create labels with tracker_id for label annotator.
labels = [ f"{tracker_id}" for tracker_id in tracked_detections.tracker_id ]
# Apply label annotator to frame.
annotated_frame = label.annotate(scene=frame.copy(), detections=tracked_detections, labels=labels)
# Save the annotated frame to an output video.
sink.write_frame(frame=annotated_frame)
Let's take a look at our resulting video. It will also be created in your current directory with the name output.mp4 Notice how even with a little flicker, we can see the tracker_id on the people walking in the video. With trackers under your belt, there are now a wide variety of use cases you can solve for! Happy building!