Benchmark and Model Zoo¶
We use distributed training.
All pytorch-style pretrained backbones on ImageNet are from PyTorch model zoo.
For fair comparison with other codebases, we report the GPU memory as the maximum value of
torch.cuda.max_memory_allocated()for all 8 GPUs. Note that this value is usually less than what
We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time. Results are obtained with the script
tools/analysis/benchmark.pywhich computes the average time on 2000 images.
Speed benchmark environments
8 NVIDIA Tesla V100 (32G) GPUs
Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Baselines of video object detection¶
DFF (CVPR 2017)¶
Please refer to DFF for details.
FGFA (ICCV 2017)¶
Please refer to FGFA for details.
SELSA (ICCV 2019)¶
Please refer to SELSA for details.
Temporal RoI Align (AAAI 2021)¶
Please refer to Temporal RoI Align for details.
Baselines of multiple object tracking¶
SORT/DeepSORT (ICIP 2016/2017)¶
Please refer to SORT/DeepSORT for details.
Tracktor (ICCV 2019)¶
Please refer to Tracktor for details.
QDTrack (CVPR 2021)¶
Please refer to QDTrack for details.
ByteTrack (ECCV 2022)¶
Please refer to ByteTrack for details.
OC-SORT (ArXiv 2022)¶
Please refer to OC-SORT for details.
Baselines of single object tracking¶
SiameseRPN++ (CVPR 2019)¶
Please refer to SiameseRPN++ for details.
STARK (ICCV 2021)¶
Please refer to STARK for details.
MixFormer (CVPR 2022)¶
Please refer to MixFormer for details.
Baselines of video instance segmentation¶
MaskTrack R-CNN (ICCV 2019)¶
Please refer to MaskTrack R-CNN for details.