4.1. Dataset and Metrics
Dataset. To assess the effectiveness of our proposed OMCTrack method in UAV multi-object tracking scenarios, we conducted a series of experiments using the VisDrone2019 and UAVDT datasets.
The VisDrone2019 dataset is gathered from various UAVs and encompasses a range of scenes, climates, and lighting conditions, with the goal of advancing research on target detection and tracking from the UAV perspective. This dataset includes ten target categories: pedestrians, cars, vans, buses, trucks, motorcycles, bicycles, tricycles, and other types of transportation. Specifically, the VisDrone2019-MOT subset is designed for multi-target tracking tasks and is divided into three parts: a training set with 56 video sequences, a validation set with seven video sequences, and a test set with 33 video sequences (16 for challenge testing and 17 for development testing). Each target in the dataset is labeled with a unique tracking identifier, category label, and precise bounding box, with key attributes such as scene visibility and occlusion also recorded. For multi-object tracking assessments, the study followed official guidelines and focused on five primary target categories: pedestrians, cars, vans, buses, and trucks. The dataset features numerous challenging scenarios, including camera motion, motion blur, dense target groups, and occlusion, providing a robust testbed for UAV multi-target tracking tasks.
The UAVDT dataset is a challenging large-scale benchmark designed for vehicle detection and tracking from the UAV perspective. It encompasses three core visual tasks: object detection, single-target tracking, and multi-target tracking. The dataset specifically targets vehicles and is categorized into three subtypes: cars, trucks, and buses. For multi-target tracking, the UAVDT dataset is divided into a training set with 30 video sequences and a test set with 20 video sequences. All videos are recorded at 30 frames per second and have a uniform resolution of 1080 × 540 pixels. The dataset includes a diverse range of everyday environments, such as squares, major roads, toll booths, highways, intersections, and T-junctions.
Metrics. To thoroughly evaluate the performance of OMCTrack and compare it with other state-of-the-art models, this study adheres to a standardized evaluation process for Multi-Object Tracking (MOT) tasks. We use a range of performance indicators, including IDF1, Multi-Object Tracking Accuracy (MOTA), High-Order Tracking Accuracy (HOTA), Association Accuracy (AssA), Detection Accuracy (DetA), False Positives (FP), Missed Detections (FN), Identity Handovers (IDs), and Frames Per Second (FPS), to provide a comprehensive assessment of the algorithm’s tracking performance.
The universal indicator MOTA is defined as
FN is a missed check, FP is a false check, ID_sw is the number of IDs that can be switched, and GT is the number of ground truths.
The HOTA metric, an extension of MOTA, represents high-order tracking accuracy. It integrates detection, association, and localization into a single comprehensive measure with a threshold parameter α,
where
is
indicates that the predicted ID and the ground truth ID for the same object are both c. refers to cases where the predicted ID is not c but the ground truth ID is c. Conversely, denotes situations where the predicted ID is c, while the ground truth ID is not c.
In this paper, we use MOTA, HOTA, and IDF1 as the primary evaluation metrics. MOTA primarily assesses the performance of the detector, while HOTA evaluates the overall effectiveness of the tracker. IDF1, which represents the ratio of correctly identified detections to the average of true and predicted detections, measures the model’s ability to maintain consistent tracking over extended periods.
In addition, we evaluated the processing speed of the tracker in terms of frames per second (FPS), which measures the system’s overall frame rate. The FPS is influenced by the performance of the hardware on which the tracker runs, making it challenging to ensure an entirely fair comparison of frame rates across different methods.
4.3. Comparison with the State-of-the-Arts Methods
VisDrone2019 Dataset. To verify the effectiveness of OMCTrack in UAV object tracking missions, we compared our method with other advanced multi-object trackers on the VisDrone2019 dataset. We used the VisDrone2019 training set and test set to train the detection network; all the multi-object tracking algorithms used the same detector weights; and we used TrackEval’s official VisDroneMOT toolkit to evaluate the performance of each object tracker on the test development set of VisDrone2019. As shown in
Table 1, OMCTrack achieves 50.6% on IDF1, 34.5% on MOTA, and 43.3% on HOTA, and the OMCTrack algorithm proposed on the VisDrone2019 dataset outperforms most advanced multi-object trackers, second only to BoTSORT. However, OMCTrack achieves 35.8 FPS, which is much better than BoTSORT’s 14.3 FPS, proving that the proposed method achieves better real-time performance while achieving better tracking performance.
Figure 6 shows a comparison of the metrics of various methods under different thresholds.
UAVDT Dataset. In addition, we compared OMCTrack with other MOT methods on the UAVDT dataset. We used the training and test sets of the UAVDT dataset to train the detectors, and all the multi-object tracking algorithms used the same detector weights. We evaluated the performance of each object tracker on the UAVDT test set using the same evaluation metrics as those used on the VisDrone2019 dataset. As shown in
Table 1, OMCTrack achieves 88.5% on IDF1, 85.5% on MOTA, and 74.7% on HOTA, and our proposed OMCTrack algorithm outperforms other advanced multi-object trackers on the UAVDT dataset.
Figure 7 shows a comparison of the metrics of various methods under different thresholds.
The results indicate that the performance of each tracker on the UAVDT dataset significantly exceeds that on the VisDrone2019 dataset. This discrepancy primarily arises from differences in detection performance with the same detector weights across the two datasets. As shown in
Table 2, the detection performance for various object categories in the VisDrone2019 dataset is 0.799 mAP50 for “cars” and 0.34 and 0.149 for “people” and “bicycles”, respectively. In contrast, the UAVDT dataset predominantly features vehicle tracking scenarios, while the VisDrone2019 dataset includes a large number of pedestrian tracking scenarios, which negatively impacts the performance of object trackers on VisDrone2019. Thus, the performance of the object detector is crucial for the effectiveness of tracking in detection-tracking paradigms.
4.5. Case Study
To better showcase the advantages and effectiveness of the OMCTrack method in real UAV object tracking tasks, we analyze specific motion scenarios involving UAVs or objects across three UAV tracking tasks.
Camera Moves Fast. When the drone or its camera shake or move rapidly, the position of the object in the image will change dramatically, which easily causes tracking drift or loss. As shown in
Figure 9, the camera pitch angle changes between video frames 81, 83, 90, 93, 98, and 103, causing the camera to move rapidly, causing problems such as ID switching, tracking loss, and tracking frame drift in both UAVMOT and ByteTrack algorithms, making it difficult to achieve accurate tracking in the case of rapid camera movement. However, OMCTrack can effectively deal with camera motion problems, compensate for distorted images in time, and achieve accurate and robust multi-object tracking.
Object Motion Occlusion. In multi-object tracking, object occlusion is a common and unavoidable challenge that significantly impacts tracking performance. Frequent occlusion, whether between objects or between objects and the background, can cause the loss of key object features, leading to object drift, where the object’s position and motion trajectory deviate, resulting in tracking loss or misidentification as a new object. As illustrated in
Figure 10, after the truck in the scene is occluded, both UAVMOT and ByteTrack algorithms experience issues like tracking loss and ID switching. However, OMCTrack effectively handles object drift caused by occlusion, thanks to its occlusion perception module, enabling stable and long-term multi-object tracking.
UAV Pulls Up and Hovers. When the drone suddenly ascends or hovers, its shooting angle shifts, causing objects in the video to appear smaller as the drone rises, or to rotate as the drone circles. This disrupts the original motion pattern of the objects, making it difficult to approximate their movement as linear. As shown in
Figure 11, algorithms like UAVMOT and ByteTrack struggle to accurately track small-scale objects due to the irregular motion of the drone. However, OMCTrack can maintain accurate tracking despite these challenges. It is important to note that we used the same object detector and detection weights across all the object tracking algorithms.