mmtrack.apis¶

mmtrack.datasets¶

datasets¶

api_wrappers¶

class mmtrack.datasets.api_wrappers.CocoVID(*args: Any, **kwargs: Any)[source]¶

Inherit official COCO class in order to parse the annotations of bbox- related video tasks.

Parameters

annotation_file (str) – location of annotation file. Defaults to None.
load_img_as_vid (bool) – If True, convert image data to video data, which means each image is converted to a video. Defaults to False.

convert_img_to_vid(dataset)[source]¶: Convert image data to video data.

createIndex()[source]¶: Create index.

get_img_ids_from_ins_id(insId)[source]¶

Get image ids from given instance id.

Parameters: insId (int) – The given instance id.
Returns: Image ids of given instance id.
Return type: list[int]

get_img_ids_from_vid(vidId)[source]¶

Get image ids from given video id.

Parameters: vidId (int) – The given video id.
Returns: Image ids of given video id.
Return type: list[int]

get_ins_ids_from_vid(vidId)[source]¶

Get instance ids from given video id.

Parameters: vidId (int) – The given video id.
Returns: Instance ids of given video id.
Return type: list[int]

get_vid_ids(vidIds=[])[source]¶

Get video ids that satisfy given filter conditions.

Default return all video ids.

Parameters: vidIds (list[int]) – The given video ids. Defaults to [].
Returns: Video ids.
Return type: list[int]

load_vids(ids=[])[source]¶

Get video information of given video ids.

Default return all videos information.

Parameters: ids (list[int]) – The given video ids. Defaults to [].
Returns: List of video information.
Return type: list[dict]

samplers¶

class mmtrack.datasets.samplers.EntireVideoBatchSampler(sampler: torch.utils.data.sampler.Sampler, batch_size: int = 1, drop_last: bool = False)[source]¶

A sampler wrapper for grouping images from one video into a same batch.

Parameters

sampler (Sampler) – Base sampler.
batch_size (int) – Size of mini-batch. Here, we take a video as a batch. Defaults to 1.
drop_last (bool) – If True, the sampler will drop the last batch if its size would be less than batch_size. Defaults to False.

class mmtrack.datasets.samplers.QuotaSampler(dataset: Sized, samples_per_epoch: int, replacement: bool = False, seed: int = 0)[source]¶

Sampler that gets fixed number of samples per epoch.

It is especially useful in conjunction with torch.nn.parallel.DistributedDataParallel. In such case, each process can pass a DistributedSampler instance as a DataLoader sampler, and load a subset of the original dataset that is exclusive to it.

Note

Dataset is assumed to be of constant size.

Parameters

dataset (Sized) – Dataset used for sampling.
samples_per_epoch (int) – The number of samples per epoch.
replacement (bool) – samples are drawn with replacement if True, Default: False.
seed (int, optional) – random seed used to shuffle the sampler if shuffle=True. This number should be identical across all processes in the distributed group. Default: 0.

class mmtrack.datasets.samplers.VideoSampler(dataset: Sized, seed: int = 0)[source]¶

The video data sampler is for both distributed and non-distributed environment. It is only used in testing.

Parameters: dataset (Sized) – The dataset.

set_epoch(epoch: int) → None[source]¶: Not supported in iteration-based runner.

transforms¶

mmtrack.engine¶

hooks¶

class mmtrack.engine.hooks.SiamRPNBackboneUnfreezeHook(backbone_start_train_epoch: int = 10, backbone_train_layers: List = ['layer2', 'layer3', 'layer4'])[source]¶

Start to train the backbone of SiamRPN++ from a certrain epoch.

Parameters

backbone_start_train_epoch (int) – Start to train the backbone at backbone_start_train_epoch-th epoch. Note the epoch in this class counts from 0, while the epoch in the log file counts from 1.
backbone_train_layers (list(str)) – List of str denoting the stages needed be trained in backbone.

before_train_epoch(runner)[source]¶: If runner.epoch >= self.backbone_start_train_epoch, start to train the backbone.

class mmtrack.engine.hooks.TrackVisualizationHook(draw: bool = False, interval: int = 30, score_thr: float = 0.3, show: bool = False, wait_time: float = 0.0, test_out_dir: Optional[str] = None, file_client_args: dict = {'backend': 'disk'})[source]¶

Tracking Visualization Hook. Used to visualize validation and testing process prediction results.

In the testing phase:

If show is True, it means that only the prediction results are
visualized without storing data, so vis_backends needs to be excluded.
If test_out_dir is specified, it means that the prediction results
need to be saved to test_out_dir. In order to avoid vis_backends also storing data, so vis_backends needs to be excluded.
vis_backends takes effect if the user does not specify show
and test_out_dir`. You can set vis_backends to WandbVisBackend or TensorboardVisBackend to store the prediction result in Wandb or Tensorboard.

Parameters

draw (bool) – whether to draw prediction results. If it is False, it means that no drawing will be done. Defaults to False.
interval (int) – The interval of visualization. Defaults to 30.
score_thr (float) – The threshold to visualize the bboxes and masks. Defaults to 0.3.
show (bool) – Whether to display the drawn image. Default to False.
wait_time (float) – The interval of show (s). Defaults to 0.
test_out_dir (str, optional) – directory where painted images will be saved in testing process.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

after_test_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: dict, outputs: Sequence[mmtrack.structures.track_data_sample.TrackDataSample]) → None[source]¶

Run after every testing iteration.

Parameters

runner (Runner) – The runner of the testing process.
batch_idx (int) – The index of the current batch in the val loop.
data_batch (dict) – Data from dataloader.
outputs (Sequence[TrackDataSample]) – Outputs from model.

after_val_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: dict, outputs: Sequence[mmtrack.structures.track_data_sample.TrackDataSample]) → None[source]¶

Run after every self.interval validation iteration.

Parameters

runner (Runner) – The runner of the validation process.
batch_idx (int) – The index of the current batch in the val loop.
data_batch (dict) – Data from dataloader.
outputs (Sequence[TrackDataSample]) – Outputs from model.

class mmtrack.engine.hooks.YOLOXModeSwitchHook(num_last_epochs: int = 15, skip_type_keys: Sequence[str] = ('Mosaic', 'RandomAffine', 'MixUp'))[source]¶

Switch the mode of YOLOX during training.

This hook turns off the mosaic and mixup data augmentation and switches to use L1 loss in bbox_head.

The difference between this class and the class in mmdet is that the class in mmdet use model.bbox_head.use_l1=True to switch mode, while this class will check whether there is a detector module in the model firstly, then use model.detector.bbox_head.use_l1=True or model.bbox_head.use_l1=True to switch mode.

before_train_epoch(runner)[source]¶: Close mosaic and mixup augmentation and switches to use L1 loss.

schedulers¶

class mmtrack.engine.schedulers.SiamRPNExpLR(optimizer, *args, **kwargs)[source]¶

Decays the parameter value of each parameter group by exponentially

changing small multiplicative factor until the number of epoch reaches a pre-defined milestone: end.

Notice that such decay can happen simultaneously with other changes to the parameter value from outside this scheduler.

\[X_{t} = X_{t-1} imes (\]

rac{end}{begin})^{ rac{1}{epochs}}

Args:
optimizer (Optimizer): Wrapped optimizer. start_factor (float): The number we multiply parameter value in the

first epoch. The multiplication factor changes towards end_factor in the following epochs. Defaults to 0.1.

end_factor (float): The number we multiply parameter value at the end
of linear changing process. Defaults to 1.0.

begin (int): Step at which to start updating the parameters.
Defaults to 0.

end (int): Step at which to stop updating the parameters.
Defaults to INF.

endpoint (bool): If true, end_factor` is included in the end.
Otherwise, it is not included. Default is True.

last_step (int): The index of last step. Used for resume without
state dict. Defaults to -1.

by_epoch (bool): Whether the scheduled parameters are updated by
epochs. Defaults to True.

verbose (bool): Whether to print the value for each update.
Defaults to False.

class mmtrack.engine.schedulers.SiamRPNExpParamScheduler(optimizer: torch.optim.optimizer.Optimizer, param_name: str, start_factor: float = 0.1, end_factor: float = 1.0, begin: int = 0, end: int = 1000000000, endpoint: bool = True, last_step: int = - 1, by_epoch: bool = True, verbose: bool = False)[source]¶

Decays the parameter value of each parameter group by exponentially

changing small multiplicative factor until the number of epoch reaches a pre-defined milestone: end.

Notice that such decay can happen simultaneously with other changes to the parameter value from outside this scheduler.

\[X_{t} = X_{t-1} imes (\]

rac{end}{begin})^{ rac{1}{epochs}}

Args:
optimizer (Optimizer): Wrapped optimizer. param_name (str): Name of the parameter to be adjusted, such as

lr, momentum.

start_factor (float): The number we multiply parameter value in the
first epoch. The multiplication factor changes towards end_factor in the following epochs. Defaults to 0.1.

end_factor (float): The number we multiply parameter value at the end
of linear changing process. Defaults to 1.0.

begin (int): Step at which to start updating the parameters.
Defaults to 0.

end (int): Step at which to stop updating the parameters.
Defaults to INF.

endpoint (bool): If true, end_factor` is included in the end.
Otherwise, it is not included. Default is True.

last_step (int): The index of last step. Used for resume without
state dict. Defaults to -1.

by_epoch (bool): Whether the scheduled parameters are updated by
epochs. Defaults to True.

verbose (bool): Whether to print the value for each update.
Defaults to False.

classmethod build_iter_from_epoch(*args, begin: int = 0, end: int = 1000000000, by_epoch: bool = True, epoch_length: Optional[int] = None, **kwargs)[source]¶

Build an iter-based instance of this scheduler from an epoch-based config.

Parameters

begin (int, optional) – Step at which to start updating the parameters. Defaults to 0.
end (int, optional) – Step at which to stop updating the parameters. Defaults to INF.
by_epoch (bool, optional) – Whether the scheduled parameters are updated by epochs. Defaults to True.
epoch_length (Optional[int], optional) – The length of each epoch. Defaults to None.

Returns

The instantiated object of SiamRPNExpParamScheduler.

Return type

Object

mmtrack.evaluation¶

functional¶

metrics¶

mmtrack.models¶

aggregators¶

backbones¶

data_preprocessors¶

filter¶

layers¶

losses¶

mot¶

motion¶

reid¶

roi_heads¶

sot¶

task_modules¶

track_heads¶

trackers¶

vid¶

vis¶

mmtrack.structures¶

structures¶

class mmtrack.structures.ReIDDataSample(*, metainfo: Optional[dict] = None, **kwargs)[source]¶

A data structure interface of ReID task.

It’s used as interfaces between different components.

Meta field:

img_shape (Tuple): The shape of the corresponding input image.: Used for visualization.
ori_shape (Tuple): The original shape of the corresponding image.: Used for visualization.
num_classes (int): The number of all categories.: Used for label format conversion.

Data field:

gt_label (LabelData): The ground truth label. pred_label (LabelData): The predicted label. scores (torch.Tensor): The outputs of model.

set_gt_label(value: Union[numpy.ndarray, torch.Tensor, Sequence[numbers.Number], numbers.Number]) → mmtrack.structures.reid_data_sample.ReIDDataSample [source]¶: Set label of gt_label.

set_gt_score(value: torch.Tensor) → mmtrack.structures.reid_data_sample.ReIDDataSample [source]¶: Set score of gt_label.

class mmtrack.structures.TrackDataSample(*, metainfo: Optional[dict] = None, **kwargs)[source]¶

A data structure interface of MMTracking. They are used as interfaces between different components.

The attributes in TrackDataSample are divided into several parts:

``gt_instances``(InstanceData): Ground truth of instance annotations
in key frames.

``ignored_instances``(InstanceData): Instances to be ignored during
training/testing in key frames.

``proposals``(InstanceData): Region proposals used in two-stage
detectors in key frames.

``ref_gt_instances``(InstanceData): Ground truth of instance
annotations in reference frames.

``ref_ignored_instances``(InstanceData): Instances to be ignored
during training/testing in reference frames.

``ref_proposals``(InstanceData): Region proposals used in two-stage
detectors in reference frames.

``pred_det_instances``(InstanceData): Detection instances of model
predictions in key frames.

``pred_track_instances``(InstanceData): Tracking instances of model
predictions in key frames.

bbox¶

mmtrack.utils¶

utils¶

class mmtrack.utils.DataLoaderBenchmark(cfg: mmengine.config.config.Config, distributed: bool, dataset_type: str, max_iter: int = 2000, log_interval: int = 50, num_warmup: int = 5, logger: Optional[mmengine.logging.logger.MMLogger] = None)[source]¶

The dataloader benchmark class. It will be statistical inference FPS and CPU memory information.

Parameters

cfg (mmengine.Config) – config.
distributed (bool) – distributed testing flag.
dataset_type (str) – benchmark data type, only supports train, val and test.
max_iter (int) – maximum iterations of benchmark. Defaults to 2000.
log_interval (int) – interval of logging. Defaults to 50.
num_warmup (int) – Number of Warmup. Defaults to 5.
logger (MMLogger, optional) – Formatted logger used to record messages.

average_multiple_runs(results: List[dict]) → dict[source]¶: Average the results of multiple runs.

run_once() → dict[source]¶: Executes the benchmark once.

class mmtrack.utils.DatasetBenchmark(cfg: mmengine.config.config.Config, dataset_type: str, max_iter: int = 2000, log_interval: int = 50, num_warmup: int = 5, logger: Optional[mmengine.logging.logger.MMLogger] = None)[source]¶

The dataset benchmark class. It will be statistical inference FPS, FPS pre transform and CPU memory information.

Parameters

cfg (mmengine.Config) – config.
dataset_type (str) – benchmark data type, only supports train, val and test.
max_iter (int) – maximum iterations of benchmark. Defaults to 2000.
log_interval (int) – interval of logging. Defaults to 50.
num_warmup (int) – Number of Warmup. Defaults to 5.
logger (MMLogger, optional) – Formatted logger used to record messages.

average_multiple_runs(results: List[dict]) → dict[source]¶: Average the results of multiple runs.

run_once() → dict[source]¶: Executes the benchmark once.

class mmtrack.utils.InferenceBenchmark(cfg: mmengine.config.config.Config, checkpoint: str, distributed: bool, is_fuse_conv_bn: bool, max_iter: int = 2000, log_interval: int = 50, num_warmup: int = 5, logger: Optional[mmengine.logging.logger.MMLogger] = None)[source]¶

The inference benchmark class. It will be statistical inference FPS, CUDA memory and CPU memory information.

Parameters

cfg (mmengine.Config) – config.
checkpoint (str) – Accept local filepath, URL, torchvision://xxx, open-mmlab://xxx.
distributed (bool) – distributed testing flag.
is_fuse_conv_bn (bool) – Whether to fuse conv and bn, this will slightly increase the inference speed.
max_iter (int) – maximum iterations of benchmark. Defaults to 2000.
log_interval (int) – interval of logging. Defaults to 50.
num_warmup (int) – Number of Warmup. Defaults to 5.
logger (MMLogger, optional) – Formatted logger used to record messages.

average_multiple_runs(results: List[dict]) → dict[source]¶: Average the results of multiple runs.

run_once() → dict[source]¶: Executes the benchmark once.

mmtrack.utils.collect_env()[source]¶: Collect the information of the running environments.

mmtrack.utils.convert_data_sample_type(data_sample: mmtrack.structures.track_data_sample.TrackDataSample, num_ref_imgs: int = 1) → Tuple[List[mmtrack.structures.track_data_sample.TrackDataSample], List[dict]][source]¶

Convert the type of data_sample from dict[list] to list[dict].

Note: This function is mainly used to be compatible with the: interface of MMDetection. It make sure that the information of each reference image can be independently packed into data_sample in which all the keys are without prefix “ref_”.

Parameters

data_sample (TrackDataSample) – Data sample input.
num_ref_imgs (int, optional) – The numbe of reference images in the data_sample. Defaults to 1.

Returns

The first element is the: list of object of TrackDataSample. The second element is the list of meta information of reference images.

Return type

Tuple[List[TrackDataSample], List[dict]]

mmtrack.utils.crop_image(image, crop_region, crop_size, padding=(0, 0, 0))[source]¶

Crop image based on crop_region and crop_size.

Parameters

image (ndarray) – of shape (H, W, 3).
crop_region (ndarray) – of shape (4, ) in [x1, y1, x2, y2] format.
crop_size (int) – Crop size.
padding (tuple | ndarray) – of shape (3, ) denoting the padding values.

Returns

Cropped image of shape (crop_size, crop_size, 3).

Return type

ndarray

mmtrack.utils.format_video_level_show(video_names: List, eval_results: List[numpy.ndarray], sort_by_first_metric: bool = True, show_indices: Optional[Tuple[int, List]] = None) → List[List][source]¶

Format video-level performance show.

Parameters

video_names (List) – The names of the videos.
eval_results (List[np.ndarray]) – The evaluation results.
sort_by_first_metric (bool, optional) – Whether to sort the results by the first metric. Defaults to True.
show_indices (Optional[Tuple[int, List]], optional) – The video indices to be shown. Defaults to None, i.e., all videos.

Returns

The formatted video-level evaluation results. For example:

[[video-2, 48.2, 49.2, 51.9],: [video-1, 46.2, 48.2, 50.2]]

Return type

List[List]

mmtrack.utils.gauss_blur(image: torch.Tensor, kernel_size: Sequence, sigma: Sequence) → torch.Tensor[source]¶

The gauss blur transform.

Parameters

image (Tensor) – of shape (n, c, h, w)
kernel_size (Tensor) – The argument kernel size for gauss blur.
sigma (Sequence) – The argument sigma for gauss blur.

Returns

The blurred image.

Return type

Tensor

mmtrack.utils.imrenormalize(img: Union[torch.Tensor, numpy.ndarray], img_norm_cfg: dict, new_img_norm_cfg: dict) → Union[torch.Tensor, numpy.ndarray][source]¶

Re-normalize the image.

Parameters

img (Tensor | ndarray) – Input image. If the input is a Tensor, the shape is (1, C, H, W). If the input is a ndarray, the shape is (H, W, C).
img_norm_cfg (dict) – Original configuration for the normalization.
new_img_norm_cfg (dict) – New configuration for the normalization.

Returns

Output image with the same type and shape of the input.

Return type

Tensor | ndarray

mmtrack.utils.imshow_mot_errors(*args, backend: str = 'cv2', **kwargs)[source]¶

Show the wrong tracks on the input image.

Parameters: backend (str, optional) – Backend of visualization. Defaults to ‘cv2’.

mmtrack.utils.max_last2d(input: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶

Computes the value and position of maximum in the last two dimensions.

Parameters: input (Tensor) – of shape (…, H, W)
Returns: The maximum value. argmax (Tensor): The position of maximum in [row, col] format.
Return type: max_val (Tensor)

mmtrack.utils.plot_norm_precision_curve(norm_precision: numpy.ndarray, tracker_names: List, plot_opts: Optional[dict] = None, plot_save_path: Optional[str] = None, show: bool = False)[source]¶

Plot curves of Norm Precision for SOT.

Parameters

norm_precision (np.ndarray) – The content of viualized indicators. It has shape (N, M), where N is the number of trackers and M is the number of Norm Precision corresponding to the X.
tracker_names (List) – The names of trackers.
plot_opts (Optional[dict], optional) – The options for plot. Defaults to None.
plot_save_path (Optional[str], optional) – The saved path of the figure. Defaults to None.
show (bool, optional) – Whether to show. Defaults to False.

mmtrack.utils.plot_precision_curve(precision: numpy.ndarray, tracker_names: List, plot_opts: Optional[dict] = None, plot_save_path: Optional[str] = None, show: bool = False)[source]¶

Plot curves of Precision for SOT.

Parameters

precision (np.ndarray) – The content of viualized indicators. It has shape (N, M), where N is the number of trackers and M is the number of Precision corresponding to the X.
tracker_names (List) – The names of trackers.
plot_opts (Optional[dict], optional) – The options for plot. Defaults to None.
plot_save_path (Optional[str], optional) – The saved path of the figure. Defaults to None.
show (bool, optional) – Whether to show. Defaults to False.

mmtrack.utils.plot_success_curve(success: numpy.ndarray, tracker_names: List, plot_opts: Optional[dict] = None, plot_save_path: Optional[str] = None, show: bool = False)[source]¶

Plot curves of Success for SOT.

Parameters

success (np.ndarray) – The content of viualized indicators. It has shape (N, M), where N is the number of trackers and M is the number of Success corresponding to the X.
tracker_names (List) – The names of trackers.
plot_opts (Optional[dict], optional) – The options for plot. Defaults to None.
plot_save_path (Optional[str], optional) – The saved path of the figure. Defaults to None.
show (bool, optional) – Whether to show. Defaults to False.

mmtrack.utils.register_all_modules(init_default_scope: bool = True) → None[source]¶

Register all modules in mmtrack into the registries.

Parameters: init_default_scope (bool) – Whether initialize the mmtrack default scope. When init_default_scope=True, the global default scope will be set to mmtrack, and all registries will build modules from mmtrack’s registry node. To understand more about the registry, please refer to https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/registry.md Defaults to True.

mmtrack.utils.stack_batch(tensors: List[torch.Tensor], pad_size_divisor: int = 0, pad_value: Union[int, float] = 0) → torch.Tensor[source]¶

Stack multiple tensors to form a batch and pad the images to the max shape use the right bottom padding mode in these images. If pad_size_divisor > 0, add padding to ensure the common height and width is divisible by pad_size_divisor.

Parameters

tensors (List[Tensor]) – The input multiple tensors. each is a TCHW 4D-tensor. T denotes the number of key/reference frames.
pad_size_divisor (int) – If pad_size_divisor > 0, add padding to ensure the common height and width is divisible by pad_size_divisor. This depends on the model, and many models need a divisibility of 32. Defaults to 0
pad_value (int, float) – The padding value. Defaults to 0

Returns

The NTCHW 5D-tensor. N denotes the batch size.

Return type

Tensor

mmtrack.apis¶

mmtrack.datasets¶

datasets¶

api_wrappers¶

samplers¶

transforms¶

mmtrack.engine¶

hooks¶

schedulers¶

mmtrack.evaluation¶

functional¶

metrics¶

mmtrack.models¶

aggregators¶

backbones¶

data_preprocessors¶

filter¶

layers¶

losses¶

mot¶

motion¶

reid¶

roi_heads¶

sot¶

task_modules¶

track_heads¶

trackers¶

vid¶

vis¶

mmtrack.structures¶

structures¶

bbox¶

mmtrack.utils¶

utils¶

mmtrack.visualiztion¶

visualiztion¶