Inherit official COCO class in order to parse the annotations of bbox- related video tasks.

  • annotation_file (str) – location of annotation file. Defaults to None.

  • load_img_as_vid (bool) – If True, convert image data to video data, which means each image is converted to a video. Defaults to False.


Convert image data to video data.


Create index.


Get image ids from given instance id.


insId (int) – The given instance id.


Image ids of given instance id.

Return type



Get image ids from given video id.


vidId (int) – The given video id.


Image ids of given video id.

Return type



Get instance ids from given video id.


vidId (int) – The given video id.


Instance ids of given video id.

Return type



Get video ids that satisfy given filter conditions.

Default return all video ids.


vidIds (list[int]) – The given video ids. Defaults to [].


Video ids.

Return type



Get video information of given video ids.

Default return all videos information.


ids (list[int]) – The given video ids. Defaults to [].


List of video information.

Return type



class mmtrack.datasets.samplers.EntireVideoBatchSampler(sampler:, batch_size: int = 1, drop_last: bool = False)[source]

A sampler wrapper for grouping images from one video into a same batch.

  • sampler (Sampler) – Base sampler.

  • batch_size (int) – Size of mini-batch. Here, we take a video as a batch. Defaults to 1.

  • drop_last (bool) – If True, the sampler will drop the last batch if its size would be less than batch_size. Defaults to False.

class mmtrack.datasets.samplers.QuotaSampler(dataset: Sized, samples_per_epoch: int, replacement: bool = False, seed: int = 0)[source]

Sampler that gets fixed number of samples per epoch.

It is especially useful in conjunction with torch.nn.parallel.DistributedDataParallel. In such case, each process can pass a DistributedSampler instance as a DataLoader sampler, and load a subset of the original dataset that is exclusive to it.


Dataset is assumed to be of constant size.

  • dataset (Sized) – Dataset used for sampling.

  • samples_per_epoch (int) – The number of samples per epoch.

  • replacement (bool) – samples are drawn with replacement if True, Default: False.

  • seed (int, optional) – random seed used to shuffle the sampler if shuffle=True. This number should be identical across all processes in the distributed group. Default: 0.

class mmtrack.datasets.samplers.VideoSampler(dataset: Sized, seed: int = 0)[source]

The video data sampler is for both distributed and non-distributed environment. It is only used in testing.


dataset (Sized) – The dataset.

set_epoch(epoch: int)None[source]

Not supported in iteration-based runner.




class mmtrack.engine.hooks.SiamRPNBackboneUnfreezeHook(backbone_start_train_epoch: int = 10, backbone_train_layers: List = ['layer2', 'layer3', 'layer4'])[source]

Start to train the backbone of SiamRPN++ from a certrain epoch.

  • backbone_start_train_epoch (int) – Start to train the backbone at backbone_start_train_epoch-th epoch. Note the epoch in this class counts from 0, while the epoch in the log file counts from 1.

  • backbone_train_layers (list(str)) – List of str denoting the stages needed be trained in backbone.


If runner.epoch >= self.backbone_start_train_epoch, start to train the backbone.

class mmtrack.engine.hooks.TrackVisualizationHook(draw: bool = False, interval: int = 30, score_thr: float = 0.3, show: bool = False, wait_time: float = 0.0, test_out_dir: Optional[str] = None, file_client_args: dict = {'backend': 'disk'})[source]

Tracking Visualization Hook. Used to visualize validation and testing process prediction results.

In the testing phase:

  1. If show is True, it means that only the prediction results are

    visualized without storing data, so vis_backends needs to be excluded.

  2. If test_out_dir is specified, it means that the prediction results

    need to be saved to test_out_dir. In order to avoid vis_backends also storing data, so vis_backends needs to be excluded.

  3. vis_backends takes effect if the user does not specify show

    and test_out_dir`. You can set vis_backends to WandbVisBackend or TensorboardVisBackend to store the prediction result in Wandb or Tensorboard.

  • draw (bool) – whether to draw prediction results. If it is False, it means that no drawing will be done. Defaults to False.

  • interval (int) – The interval of visualization. Defaults to 30.

  • score_thr (float) – The threshold to visualize the bboxes and masks. Defaults to 0.3.

  • show (bool) – Whether to display the drawn image. Default to False.

  • wait_time (float) – The interval of show (s). Defaults to 0.

  • test_out_dir (str, optional) – directory where painted images will be saved in testing process.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

after_test_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: dict, outputs: Sequence[mmtrack.structures.track_data_sample.TrackDataSample])None[source]

Run after every testing iteration.

  • runner (Runner) – The runner of the testing process.

  • batch_idx (int) – The index of the current batch in the val loop.

  • data_batch (dict) – Data from dataloader.

  • outputs (Sequence[TrackDataSample]) – Outputs from model.

after_val_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: dict, outputs: Sequence[mmtrack.structures.track_data_sample.TrackDataSample])None[source]

Run after every self.interval validation iteration.

  • runner (Runner) – The runner of the validation process.

  • batch_idx (int) – The index of the current batch in the val loop.

  • data_batch (dict) – Data from dataloader.

  • outputs (Sequence[TrackDataSample]) – Outputs from model.

class mmtrack.engine.hooks.YOLOXModeSwitchHook(num_last_epochs: int = 15, skip_type_keys: Sequence[str] = ('Mosaic', 'RandomAffine', 'MixUp'))[source]

Switch the mode of YOLOX during training.

This hook turns off the mosaic and mixup data augmentation and switches to use L1 loss in bbox_head.

The difference between this class and the class in mmdet is that the class in mmdet use model.bbox_head.use_l1=True to switch mode, while this class will check whether there is a detector module in the model firstly, then use model.detector.bbox_head.use_l1=True or model.bbox_head.use_l1=True to switch mode.


Close mosaic and mixup augmentation and switches to use L1 loss.


class mmtrack.engine.schedulers.SiamRPNExpLR(optimizer, *args, **kwargs)[source]
Decays the parameter value of each parameter group by exponentially

changing small multiplicative factor until the number of epoch reaches a pre-defined milestone: end.

Notice that such decay can happen simultaneously with other changes to the parameter value from outside this scheduler.

\[X_{t} = X_{t-1} imes (\]

rac{end}{begin})^{ rac{1}{epochs}}


optimizer (Optimizer): Wrapped optimizer. start_factor (float): The number we multiply parameter value in the

first epoch. The multiplication factor changes towards end_factor in the following epochs. Defaults to 0.1.

end_factor (float): The number we multiply parameter value at the end

of linear changing process. Defaults to 1.0.

begin (int): Step at which to start updating the parameters.

Defaults to 0.

end (int): Step at which to stop updating the parameters.

Defaults to INF.

endpoint (bool): If true, end_factor` is included in the end.

Otherwise, it is not included. Default is True.

last_step (int): The index of last step. Used for resume without

state dict. Defaults to -1.

by_epoch (bool): Whether the scheduled parameters are updated by

epochs. Defaults to True.

verbose (bool): Whether to print the value for each update.

Defaults to False.

class mmtrack.engine.schedulers.SiamRPNExpParamScheduler(optimizer: torch.optim.optimizer.Optimizer, param_name: str, start_factor: float = 0.1, end_factor: float = 1.0, begin: int = 0, end: int = 1000000000, endpoint: bool = True, last_step: int = - 1, by_epoch: bool = True, verbose: bool = False)[source]
Decays the parameter value of each parameter group by exponentially

changing small multiplicative factor until the number of epoch reaches a pre-defined milestone: end.

Notice that such decay can happen simultaneously with other changes to the parameter value from outside this scheduler.

\[X_{t} = X_{t-1} imes (\]

rac{end}{begin})^{ rac{1}{epochs}}


optimizer (Optimizer): Wrapped optimizer. param_name (str): Name of the parameter to be adjusted, such as

lr, momentum.

start_factor (float): The number we multiply parameter value in the

first epoch. The multiplication factor changes towards end_factor in the following epochs. Defaults to 0.1.

end_factor (float): The number we multiply parameter value at the end

of linear changing process. Defaults to 1.0.

begin (int): Step at which to start updating the parameters.

Defaults to 0.

end (int): Step at which to stop updating the parameters.

Defaults to INF.

endpoint (bool): If true, end_factor` is included in the end.

Otherwise, it is not included. Default is True.

last_step (int): The index of last step. Used for resume without

state dict. Defaults to -1.

by_epoch (bool): Whether the scheduled parameters are updated by

epochs. Defaults to True.

verbose (bool): Whether to print the value for each update.

Defaults to False.

classmethod build_iter_from_epoch(*args, begin: int = 0, end: int = 1000000000, by_epoch: bool = True, epoch_length: Optional[int] = None, **kwargs)[source]

Build an iter-based instance of this scheduler from an epoch-based config.

  • begin (int, optional) – Step at which to start updating the parameters. Defaults to 0.

  • end (int, optional) – Step at which to stop updating the parameters. Defaults to INF.

  • by_epoch (bool, optional) – Whether the scheduled parameters are updated by epochs. Defaults to True.

  • epoch_length (Optional[int], optional) – The length of each epoch. Defaults to None.


The instantiated object of SiamRPNExpParamScheduler.

Return type
























class mmtrack.structures.ReIDDataSample(*, metainfo: Optional[dict] = None, **kwargs)[source]

A data structure interface of ReID task.

It’s used as interfaces between different components.

Meta field:
img_shape (Tuple): The shape of the corresponding input image.

Used for visualization.

ori_shape (Tuple): The original shape of the corresponding image.

Used for visualization.

num_classes (int): The number of all categories.

Used for label format conversion.

Data field:

gt_label (LabelData): The ground truth label. pred_label (LabelData): The predicted label. scores (torch.Tensor): The outputs of model.

set_gt_label(value: Union[numpy.ndarray, torch.Tensor, Sequence[numbers.Number], numbers.Number])mmtrack.structures.reid_data_sample.ReIDDataSample[source]

Set label of gt_label.

set_gt_score(value: torch.Tensor)mmtrack.structures.reid_data_sample.ReIDDataSample[source]

Set score of gt_label.

class mmtrack.structures.TrackDataSample(*, metainfo: Optional[dict] = None, **kwargs)[source]

A data structure interface of MMTracking. They are used as interfaces between different components.

The attributes in TrackDataSample are divided into several parts:

  • ``gt_instances``(InstanceData): Ground truth of instance annotations

    in key frames.

  • ``ignored_instances``(InstanceData): Instances to be ignored during

    training/testing in key frames.

  • ``proposals``(InstanceData): Region proposals used in two-stage

    detectors in key frames.

  • ``ref_gt_instances``(InstanceData): Ground truth of instance

    annotations in reference frames.

  • ``ref_ignored_instances``(InstanceData): Instances to be ignored

    during training/testing in reference frames.

  • ``ref_proposals``(InstanceData): Region proposals used in two-stage

    detectors in reference frames.

  • ``pred_det_instances``(InstanceData): Detection instances of model

    predictions in key frames.

  • ``pred_track_instances``(InstanceData): Tracking instances of model

    predictions in key frames.




class mmtrack.utils.DataLoaderBenchmark(cfg: mmengine.config.config.Config, distributed: bool, dataset_type: str, max_iter: int = 2000, log_interval: int = 50, num_warmup: int = 5, logger: Optional[mmengine.logging.logger.MMLogger] = None)[source]

The dataloader benchmark class. It will be statistical inference FPS and CPU memory information.

  • cfg (mmengine.Config) – config.

  • distributed (bool) – distributed testing flag.

  • dataset_type (str) – benchmark data type, only supports train, val and test.

  • max_iter (int) – maximum iterations of benchmark. Defaults to 2000.

  • log_interval (int) – interval of logging. Defaults to 50.

  • num_warmup (int) – Number of Warmup. Defaults to 5.

  • logger (MMLogger, optional) – Formatted logger used to record messages.

average_multiple_runs(results: List[dict])dict[source]

Average the results of multiple runs.


Executes the benchmark once.

class mmtrack.utils.DatasetBenchmark(cfg: mmengine.config.config.Config, dataset_type: str, max_iter: int = 2000, log_interval: int = 50, num_warmup: int = 5, logger: Optional[mmengine.logging.logger.MMLogger] = None)[source]

The dataset benchmark class. It will be statistical inference FPS, FPS pre transform and CPU memory information.

  • cfg (mmengine.Config) – config.

  • dataset_type (str) – benchmark data type, only supports train, val and test.

  • max_iter (int) – maximum iterations of benchmark. Defaults to 2000.

  • log_interval (int) – interval of logging. Defaults to 50.

  • num_warmup (int) – Number of Warmup. Defaults to 5.

  • logger (MMLogger, optional) – Formatted logger used to record messages.

average_multiple_runs(results: List[dict])dict[source]

Average the results of multiple runs.


Executes the benchmark once.

class mmtrack.utils.InferenceBenchmark(cfg: mmengine.config.config.Config, checkpoint: str, distributed: bool, is_fuse_conv_bn: bool, max_iter: int = 2000, log_interval: int = 50, num_warmup: int = 5, logger: Optional[mmengine.logging.logger.MMLogger] = None)[source]

The inference benchmark class. It will be statistical inference FPS, CUDA memory and CPU memory information.

  • cfg (mmengine.Config) – config.

  • checkpoint (str) – Accept local filepath, URL, torchvision://xxx, open-mmlab://xxx.

  • distributed (bool) – distributed testing flag.

  • is_fuse_conv_bn (bool) – Whether to fuse conv and bn, this will slightly increase the inference speed.

  • max_iter (int) – maximum iterations of benchmark. Defaults to 2000.

  • log_interval (int) – interval of logging. Defaults to 50.

  • num_warmup (int) – Number of Warmup. Defaults to 5.

  • logger (MMLogger, optional) – Formatted logger used to record messages.

average_multiple_runs(results: List[dict])dict[source]

Average the results of multiple runs.


Executes the benchmark once.


Collect the information of the running environments.

mmtrack.utils.convert_data_sample_type(data_sample: mmtrack.structures.track_data_sample.TrackDataSample, num_ref_imgs: int = 1)Tuple[List[mmtrack.structures.track_data_sample.TrackDataSample], List[dict]][source]

Convert the type of data_sample from dict[list] to list[dict].

Note: This function is mainly used to be compatible with the

interface of MMDetection. It make sure that the information of each reference image can be independently packed into data_sample in which all the keys are without prefix “ref_”.

  • data_sample (TrackDataSample) – Data sample input.

  • num_ref_imgs (int, optional) – The numbe of reference images in the data_sample. Defaults to 1.


The first element is the

list of object of TrackDataSample. The second element is the list of meta information of reference images.

Return type

Tuple[List[TrackDataSample], List[dict]]

mmtrack.utils.crop_image(image, crop_region, crop_size, padding=(0, 0, 0))[source]

Crop image based on crop_region and crop_size.

  • image (ndarray) – of shape (H, W, 3).

  • crop_region (ndarray) – of shape (4, ) in [x1, y1, x2, y2] format.

  • crop_size (int) – Crop size.

  • padding (tuple | ndarray) – of shape (3, ) denoting the padding values.


Cropped image of shape (crop_size, crop_size, 3).

Return type


mmtrack.utils.format_video_level_show(video_names: List, eval_results: List[numpy.ndarray], sort_by_first_metric: bool = True, show_indices: Optional[Tuple[int, List]] = None)List[List][source]

Format video-level performance show.

  • video_names (List) – The names of the videos.

  • eval_results (List[np.ndarray]) – The evaluation results.

  • sort_by_first_metric (bool, optional) – Whether to sort the results by the first metric. Defaults to True.

  • show_indices (Optional[Tuple[int, List]], optional) – The video indices to be shown. Defaults to None, i.e., all videos.


The formatted video-level evaluation results. For example:
[[video-2, 48.2, 49.2, 51.9],

[video-1, 46.2, 48.2, 50.2]]

Return type


mmtrack.utils.gauss_blur(image: torch.Tensor, kernel_size: Sequence, sigma: Sequence)torch.Tensor[source]

The gauss blur transform.

  • image (Tensor) – of shape (n, c, h, w)

  • kernel_size (Tensor) – The argument kernel size for gauss blur.

  • sigma (Sequence) – The argument sigma for gauss blur.


The blurred image.

Return type


mmtrack.utils.imrenormalize(img: Union[torch.Tensor, numpy.ndarray], img_norm_cfg: dict, new_img_norm_cfg: dict)Union[torch.Tensor, numpy.ndarray][source]

Re-normalize the image.

  • img (Tensor | ndarray) – Input image. If the input is a Tensor, the shape is (1, C, H, W). If the input is a ndarray, the shape is (H, W, C).

  • img_norm_cfg (dict) – Original configuration for the normalization.

  • new_img_norm_cfg (dict) – New configuration for the normalization.


Output image with the same type and shape of the input.

Return type

Tensor | ndarray

mmtrack.utils.imshow_mot_errors(*args, backend: str = 'cv2', **kwargs)[source]

Show the wrong tracks on the input image.


backend (str, optional) – Backend of visualization. Defaults to ‘cv2’.

mmtrack.utils.max_last2d(input: torch.Tensor)Tuple[torch.Tensor, torch.Tensor][source]

Computes the value and position of maximum in the last two dimensions.


input (Tensor) – of shape (…, H, W)


The maximum value. argmax (Tensor): The position of maximum in [row, col] format.

Return type

max_val (Tensor)

mmtrack.utils.plot_norm_precision_curve(norm_precision: numpy.ndarray, tracker_names: List, plot_opts: Optional[dict] = None, plot_save_path: Optional[str] = None, show: bool = False)[source]

Plot curves of Norm Precision for SOT.

  • norm_precision (np.ndarray) – The content of viualized indicators. It has shape (N, M), where N is the number of trackers and M is the number of Norm Precision corresponding to the X.

  • tracker_names (List) – The names of trackers.

  • plot_opts (Optional[dict], optional) – The options for plot. Defaults to None.

  • plot_save_path (Optional[str], optional) – The saved path of the figure. Defaults to None.

  • show (bool, optional) – Whether to show. Defaults to False.

mmtrack.utils.plot_precision_curve(precision: numpy.ndarray, tracker_names: List, plot_opts: Optional[dict] = None, plot_save_path: Optional[str] = None, show: bool = False)[source]

Plot curves of Precision for SOT.

  • precision (np.ndarray) – The content of viualized indicators. It has shape (N, M), where N is the number of trackers and M is the number of Precision corresponding to the X.

  • tracker_names (List) – The names of trackers.

  • plot_opts (Optional[dict], optional) – The options for plot. Defaults to None.

  • plot_save_path (Optional[str], optional) – The saved path of the figure. Defaults to None.

  • show (bool, optional) – Whether to show. Defaults to False.

mmtrack.utils.plot_success_curve(success: numpy.ndarray, tracker_names: List, plot_opts: Optional[dict] = None, plot_save_path: Optional[str] = None, show: bool = False)[source]

Plot curves of Success for SOT.

  • success (np.ndarray) – The content of viualized indicators. It has shape (N, M), where N is the number of trackers and M is the number of Success corresponding to the X.

  • tracker_names (List) – The names of trackers.

  • plot_opts (Optional[dict], optional) – The options for plot. Defaults to None.

  • plot_save_path (Optional[str], optional) – The saved path of the figure. Defaults to None.

  • show (bool, optional) – Whether to show. Defaults to False.

mmtrack.utils.register_all_modules(init_default_scope: bool = True)None[source]

Register all modules in mmtrack into the registries.


init_default_scope (bool) – Whether initialize the mmtrack default scope. When init_default_scope=True, the global default scope will be set to mmtrack, and all registries will build modules from mmtrack’s registry node. To understand more about the registry, please refer to Defaults to True.

mmtrack.utils.stack_batch(tensors: List[torch.Tensor], pad_size_divisor: int = 0, pad_value: Union[int, float] = 0)torch.Tensor[source]

Stack multiple tensors to form a batch and pad the images to the max shape use the right bottom padding mode in these images. If pad_size_divisor > 0, add padding to ensure the common height and width is divisible by pad_size_divisor.

  • tensors (List[Tensor]) – The input multiple tensors. each is a TCHW 4D-tensor. T denotes the number of key/reference frames.

  • pad_size_divisor (int) – If pad_size_divisor > 0, add padding to ensure the common height and width is divisible by pad_size_divisor. This depends on the model, and many models need a divisibility of 32. Defaults to 0

  • pad_value (int, float) – The padding value. Defaults to 0


The NTCHW 5D-tensor. N denotes the batch size.

Return type




