mmtrack.apis¶

mmtrack.apis.inference_mot(model, img, frame_id)[源代码]¶

Inference image(s) with the mot model.

参数

model (nn.Module) – The loaded mot model.
img (str | ndarray) – Either image name or loaded image.
frame_id (int) – frame id.

返回

ndarray]: The tracking results.

返回类型

dict[str

mmtrack.apis.inference_sot(model, image, init_bbox, frame_id)[源代码]¶

Inference image with the single object tracker.

参数

model (nn.Module) – The loaded tracker.
image (ndarray) – Loaded images.
init_bbox (ndarray) – The target needs to be tracked.
frame_id (int) – frame id.

返回

ndarray]: The tracking results.

返回类型

dict[str

mmtrack.apis.inference_vid(model, image, frame_id, ref_img_sampler={'frame_stride': 10, 'num_left_ref_imgs': 10})[源代码]¶

Inference image with the video object detector.

参数

model (nn.Module) – The loaded detector.
image (ndarray) – Loaded images.
frame_id (int) – Frame id.
ref_img_sampler (dict) – The configuration for sampling reference images. Only used under video detector of fgfa style. Defaults to dict(frame_stride=2, num_left_ref_imgs=10).

返回

ndarray]: The detection results.

返回类型

dict[str

mmtrack.apis.init_model(config, checkpoint=None, device='cuda:0', cfg_options=None, verbose_init_params=False)[源代码]¶

Initialize a model from config file.

参数

config (str or mmcv.Config) – Config file path or the config object.
checkpoint (str, optional) – Checkpoint path. Default as None.
cfg_options (dict, optional) – Options to override some settings in the used config. Default to None.
verbose_init_params (bool, optional) – Whether to print the information of initialized parameters to the console. Default to False.

返回

The constructed detector.

返回类型

nn.Module

mmtrack.apis.init_random_seed(seed=None, device='cuda')[源代码]¶

Initialize random seed.

If the seed is not set, the seed will be automatically randomized, and then broadcast to all processes to prevent some potential bugs. :param seed: The seed. Default to None. :type seed: int, Optional :param device: The device where the seed will be put on.

Default to ‘cuda’.

返回: Seed to be used.
返回类型: int

mmtrack.apis.multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False)[源代码]¶

Test model with multiple gpus.

This method tests model with multiple gpus and collects the results under two different modes: gpu and cpu modes. By setting ‘gpu_collect=True’ it encodes results to gpu tensors and use gpu communication for results collection. On cpu mode it saves the results on different gpus to ‘tmpdir’ and collects them by the rank 0 worker. ‘gpu_collect=True’ is not supported for now.

参数

model (nn.Module) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.
tmpdir (str) – Path of directory to save the temporary results from different gpus under cpu mode. Defaults to None.
gpu_collect (bool) – Option to use either gpu or cpu to collect results. Defaults to False.

返回

The prediction results.

返回类型

dict[str, list]

mmtrack.apis.single_gpu_test(model, data_loader, show=False, out_dir=None, fps=3, show_score_thr=0.3)[源代码]¶

Test model with single gpu.

参数

model (nn.Module) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.
show (bool, optional) – If True, visualize the prediction results. Defaults to False.
out_dir (str, optional) – Path of directory to save the visualization results. Defaults to None.
fps (int, optional) – FPS of the output video. Defaults to 3.
show_score_thr (float, optional) – The score threshold of visualization (Only used in VID for now). Defaults to 0.3.

返回

The prediction results.

返回类型

dict[str, list]

mmtrack.apis.train_model(model, dataset, cfg, distributed=False, validate=False, timestamp=None, meta=None)[源代码]¶

Train model entry function.

参数

model (nn.Module) – The model to be trained.
dataset (Dataset) – Train dataset.
cfg (dict) – The config dict for training.
distributed (bool) – Whether to use distributed training. Default: False.
validate (bool) – Whether to do evaluation. Default: False.
timestamp (str | None) – Local time for runner. Default: None.
meta (dict | None) – Meta dict to record some important information. Default: None

mmtrack.core¶

anchor¶

class mmtrack.core.anchor.SiameseRPNAnchorGenerator(strides, *args, **kwargs)[源代码]¶

Anchor generator for siamese rpn.

Please refer to mmdet/core/anchor/anchor_generator.py:AnchorGenerator for detailed docstring.

gen_2d_hanning_windows(featmap_sizes, device='cuda')[源代码]¶

Generate 2D hanning window.

参数

featmap_sizes (list[torch.size]) – List of torch.size recording the resolution (height, width) of the multi-level feature maps.
device (str) – Device the tensor will be put on. Defaults to ‘cuda’.

返回

List of 2D hanning window with shape (num_base_anchors[i] * featmap_sizes[i][0] * featmap_sizes[i][1]).

返回类型

list[Tensor]

gen_single_level_base_anchors(base_size, scales, ratios, center=None)[源代码]¶

Generate base anchors of a single level feature map.

参数

base_size (int | float) – Basic size of an anchor.
scales (torch.Tensor) – Scales of the anchor.
ratios (torch.Tensor) – The ratio between between the height and width of anchors in a single level.
center (tuple[float], optional) – The center of the base anchor related to a single feature grid. Defaults to None.

返回

Anchors of one spatial location in a single level feature map in [tl_x, tl_y, br_x, br_y] format.

返回类型

torch.Tensor

evaluation¶

class mmtrack.core.evaluation.DistEvalHook(dataloader: torch.utils.data.dataloader.DataLoader, start: Optional[int] = None, interval: int = 1, by_epoch: bool = True, save_best: Optional[str] = None, rule: Optional[str] = None, test_fn: Optional[Callable] = None, greater_keys: Optional[List[str]] = None, less_keys: Optional[List[str]] = None, broadcast_bn_buffer: bool = True, tmpdir: Optional[str] = None, gpu_collect: bool = False, out_dir: Optional[str] = None, file_client_args: Optional[dict] = None, **eval_kwargs)[源代码]¶: Please refer to mmcv.runner.hooks.evaluation.py:DistEvalHook for detailed docstring.

class mmtrack.core.evaluation.EvalHook(dataloader: torch.utils.data.dataloader.DataLoader, start: Optional[int] = None, interval: int = 1, by_epoch: bool = True, save_best: Optional[str] = None, rule: Optional[str] = None, test_fn: Optional[Callable] = None, greater_keys: Optional[List[str]] = None, less_keys: Optional[List[str]] = None, out_dir: Optional[str] = None, file_client_args: Optional[dict] = None, **eval_kwargs)[源代码]¶: Please refer to mmcv.runner.hooks.evaluation.py:EvalHook for detailed docstring.

mmtrack.core.evaluation.bbox2region(bbox)[源代码]¶

Convert bbox to Rectangle or Polygon Class object.

参数: bbox (ndarray) – the format of rectangle bbox is (x1, y1, w, h); the format of polygon is (x1, y1, x2, y2, …).
返回: Rectangle or Polygon Class object.

mmtrack.core.evaluation.eval_mot(results, annotations, logger=None, classes=None, iou_thr=0.5, ignore_iof_thr=0.5, ignore_by_classes=False, nproc=4)[源代码]¶

Evaluation CLEAR MOT metrics.

参数

results (list[list[list[ndarray]]]) – The first list indicates videos, The second list indicates images. The third list indicates categories. The ndarray indicates the tracking results.
annotations (list[list[dict]]) –
The first list indicates videos, The second list indicates images. The third list indicates the annotations of each video. Keys of annotations are
- bboxes: numpy array of shape (n, 4)
- labels: numpy array of shape (n, )
- instance_ids: numpy array of shape (n, )
- bboxes_ignore (optional): numpy array of shape (k, 4)
- labels_ignore (optional): numpy array of shape (k, )
logger (logging.Logger | str | None, optional) – The way to print the evaluation results. Defaults to None.
classes (list, optional) – Classes in the dataset. Defaults to None.
iou_thr (float, optional) – IoU threshold for evaluation. Defaults to 0.5.
ignore_iof_thr (float, optional) – Iof threshold to ignore results. Defaults to 0.5.
ignore_by_classes (bool, optional) – Whether ignore the results by classes or not. Defaults to False.
nproc (int, optional) – Number of the processes. Defaults to 4.

返回

Evaluation results.

返回类型

dict[str, float]

mmtrack.core.evaluation.eval_sot_accuracy_robustness(results, annotations, burnin=10, ignore_unknown=True, videos_wh=None)[源代码]¶

Calculate accuracy and robustness over all tracking sequences.

参数

results (list[list[ndarray]]) –
The first list contains the tracking results of each video. The second list contains the tracking results of each frame in one video. The ndarray have two cases:
- bbox: denotes the normal tracking box in [x1, y1, w, h]
  format.
- special tracking state: [0] denotes the unknown state,
  namely the skipping frame after failure, [1] denotes the initialized state, and [2] denotes the failed state.
annotations (list[ndarray]) – The list contains the gt_bboxes of each video. The ndarray is gt_bboxes of one video. It’s in (N, 4) shape. Each bbox is in (x1, y1, w, h) format.
burnin – number of frames that have to be ignored after the re-initialization when calculating accuracy. Default is 10.
ignore_unknown (bool) – whether ignore the skipping frames after failures when calculating accuracy. Default is True.
videos_wh (list[tuple(width, height), ...]) – The list contains the width and height of each video. Default is None.

返回

float}: accuracy and robustness in EAO evaluation metric.

返回类型

dict{str

mmtrack.core.evaluation.eval_sot_eao(results, annotations, interval=[100, 356], videos_wh=None)[源代码]¶

Calculate EAO socre over all tracking sequences.

参数

results (list[list[ndarray]]) –
The first list contains the tracking results of each video. The second list contains the tracking results of each frame in one video. The ndarray have two cases:
- bbox: denotes the normal tracking box in [x1, y1, w, h]
  format.
- special tracking state: [0] denotes the unknown state,
  namely the skipping frame after failure, [1] denotes the initialized state, and [2] denotes the failed state.
annotations (list[ndarray]) – The list contains the gt_bboxes of each video. The ndarray is gt_bboxes of one video. It’s in (N, 4) shape. Each bbox is in (x1, y1, w, h) format.
interval – an specified interval in EAO curve used to calculate the EAO score. There are different settings in different VOT challenge. Default is VOT2018 setting: [100, 356].
videos_wh (list[tuple(width, height), ...]) – The list contains the width and height of each video. Default is None.

返回

EAO score in EAO evaluation metric.

返回类型

dict[str, float]

mmtrack.core.evaluation.eval_sot_ope(results, annotations, visible_infos=None)[源代码]¶

Evaluation in OPE protocol.

参数

results (list[list[ndarray]]) – The first list contains the tracking results of each video. The second list contains the tracking results of each frame in one video. The ndarray denotes the tracking box in [tl_x, tl_y, br_x, br_y] format.
annotations (list[ndarray]) – The list contains the bbox annotations of each video. The ndarray is gt_bboxes of one video. It’s in (N, 4) shape. Each bbox is in (x1, y1, x2, y2) format.
visible_infos (list[ndarray] | None) – If not None, the list contains the visible information of each video. The ndarray is visibility (with bool type) of object in one video. It’s in (N,) shape. Default to None.

返回

OPE style evaluation metric (i.e. success, norm precision and precision).

返回类型

dict[str, float]

mmtrack.core.evaluation.eval_vis(test_results, vis_anns, logger=None)[源代码]¶

Evaluation on VIS metrics.

参数

test_results (dict(list[dict])) – Testing results of the VIS dataset.
vis_anns (dict(list[dict])) – The annotation in the format of YouTube-VIS.
logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.

返回

Evaluation results.

返回类型

dict[str, float]

motion¶

mmtrack.core.motion.flow_warp_feats(x, flow)[源代码]¶

Use flow to warp feature map.

参数

x (Tensor) – of shape (N, C, H_x, W_x).
flow (Tensor) – of shape (N, C, H_f, W_f).

返回

The warpped feature map with shape (N, C, H_x, W_x).

返回类型

Tensor

optimizer¶

class mmtrack.core.optimizer.SiameseRPNFp16OptimizerHook(backbone_start_train_epoch, backbone_train_layers, **kwargs)[源代码]¶

FP16Optimizer hook for siamese rpn.

参数

backbone_start_train_epoch (int) – Start to train the backbone at backbone_start_train_epoch-th epoch. Note the epoch in this class counts from 0, while the epoch in the log file counts from 1.
backbone_train_layers (list(str)) – List of str denoting the stages needed be trained in backbone.

before_train_epoch(runner)[源代码]¶: If runner.epoch >= self.backbone_start_train_epoch, start to train the backbone.

class mmtrack.core.optimizer.SiameseRPNLrUpdaterHook(lr_configs=[{'type': 'step', 'start_lr_factor': 0.2, 'end_lr_factor': 1.0, 'end_epoch': 5}, {'type': 'log', 'start_lr_factor': 1.0, 'end_lr_factor': 0.1, 'end_epoch': 20}], **kwargs)[源代码]¶

Learning rate updater for siamese rpn.

参数: lr_configs (list[dict]) – List of dict where each dict denotes the configuration of specifical learning rate updater and must have ‘type’.

get_lr(runner, base_lr)[源代码]¶: Get a specifical learning rate for each epoch.

class mmtrack.core.optimizer.SiameseRPNOptimizerHook(backbone_start_train_epoch, backbone_train_layers, **kwargs)[源代码]¶

Optimizer hook for siamese rpn.

参数

backbone_start_train_epoch (int) – Start to train the backbone at backbone_start_train_epoch-th epoch. Note the epoch in this class counts from 0, while the epoch in the log file counts from 1.
backbone_train_layers (list(str)) – List of str denoting the stages needed be trained in backbone.

before_train_epoch(runner)[源代码]¶: If runner.epoch >= self.backbone_start_train_epoch, start to train the backbone.

track¶

mmtrack.core.track.depthwise_correlation(x, kernel)[源代码]¶

Depthwise cross correlation.

This function is proposed in SiamRPN++.

参数

x (Tensor) – of shape (N, C, H_x, W_x).
kernel (Tensor) – of shape (N, C, H_k, W_k).

返回

of shape (N, C, H_o, W_o). H_o = H_x - H_k + 1. So does W_o.

返回类型

Tensor

mmtrack.core.track.embed_similarity(key_embeds, ref_embeds, method='dot_product', temperature=- 1)[源代码]¶

Calculate feature similarity from embeddings.

参数

key_embeds (Tensor) – Shape (N1, C).
ref_embeds (Tensor) – Shape (N2, C).
method (str, optional) – Method to calculate the similarity, options are ‘dot_product’ and ‘cosine’. Defaults to ‘dot_product’.
temperature (int, optional) – Softmax temperature. Defaults to -1.

返回

Similarity matrix of shape (N1, N2).

返回类型

Tensor

mmtrack.core.track.imrenormalize(img, img_norm_cfg, new_img_norm_cfg)[源代码]¶

Re-normalize the image.

参数

img (Tensor | ndarray) – Input image. If the input is a Tensor, the shape is (1, C, H, W). If the input is a ndarray, the shape is (H, W, C).
img_norm_cfg (dict) – Original configuration for the normalization.
new_img_norm_cfg (dict) – New configuration for the normalization.

返回

Output image with the same type and shape of the input.

返回类型

Tensor | ndarray

mmtrack.core.track.interpolate_tracks(tracks, min_num_frames=5, max_num_frames=20)[源代码]¶

Interpolate tracks linearly to make tracks more complete.

This function is proposed in “ByteTrack: Multi-Object Tracking by Associating Every Detection Box.” `ByteTrack<https://arxiv.org/abs/2110.06864>`_.

参数

tracks (ndarray) – With shape (N, 7). Each row denotes (frame_id, track_id, x1, y1, x2, y2, score).
min_num_frames (int, optional) – The minimum length of a track that will be interpolated. Defaults to 5.
max_num_frames (int, optional) – The maximum disconnected length in a track. Defaults to 20.

返回

The interpolated tracks with shape (N, 7). Each row denotes: (frame_id, track_id, x1, y1, x2, y2, score)

返回类型

ndarray

mmtrack.core.track.outs2results(bboxes=None, labels=None, masks=None, ids=None, num_classes=None, **kwargs)[源代码]¶

Convert tracking/detection results to a list of numpy arrays.

参数

bboxes (torch.Tensor | np.ndarray) – shape (n, 5)
labels (torch.Tensor | np.ndarray) – shape (n, )
masks (torch.Tensor | np.ndarray) – shape (n, h, w)
ids (torch.Tensor | np.ndarray) – shape (n, )
num_classes (int) – class number, not including background class

返回

list(ndarray) | list[list[np.ndarray]]]: tracking/detection results of each class. It may contain keys as belows:

bbox_results (list[np.ndarray]): Each list denotes bboxes of one
category.
mask_results (list[list[np.ndarray]]): Each outer list denotes masks
of one category. Each inner list denotes one mask belonging to the category. Each mask has shape (h, w).

返回类型

dict[str

mmtrack.core.track.results2outs(bbox_results=None, mask_results=None, mask_shape=None, **kwargs)[源代码]¶

Restore the results (list of results of each category) into the results of the model forward.

参数

bbox_results (list[np.ndarray]) – Each list denotes bboxes of one category.
mask_results (list[list[np.ndarray]]) – Each outer list denotes masks of one category. Each inner list denotes one mask belonging to the category. Each mask has shape (h, w).
mask_shape (tuple[int]) – The shape (h, w) of mask.

返回

tracking results of each class. It may contain keys as belows:

bboxes (np.ndarray): shape (n, 5)
labels (np.ndarray): shape (n, )
masks (np.ndarray): shape (n, h, w)
ids (np.ndarray): shape (n, )

返回类型

tuple

utils¶

mmtrack.core.utils.crop_image(image, crop_region, crop_size, padding=(0, 0, 0))[源代码]¶

Crop image based on crop_region and crop_size.

参数

image (ndarray) – of shape (H, W, 3).
crop_region (ndarray) – of shape (4, ) in [x1, y1, x2, y2] format.
crop_size (int) – Crop size.
padding (tuple | ndarray) – of shape (3, ) denoting the padding values.

返回

Cropped image of shape (crop_size, crop_size, 3).

返回类型

ndarray

mmtrack.core.utils.imshow_mot_errors(*args, backend='cv2', **kwargs)[源代码]¶

Show the wrong tracks on the input image.

参数: backend (str, optional) – Backend of visualization. Defaults to ‘cv2’.

mmtrack.core.utils.imshow_tracks(*args, backend='cv2', **kwargs)[源代码]¶: Show the tracks on the input image.

mmtrack.datasets¶

datasets¶

class mmtrack.datasets.BaseSOTDataset(img_prefix, pipeline, split, ann_file=None, test_mode=False, bbox_min_size=0, only_eval_visible=False, file_client_args={'backend': 'disk'}, **kwargs)[源代码]¶

Dataset of single object tracking. The dataset can both support training and testing mode.

参数

img_prefix (str) – Prefix in the paths of image files.
pipeline (list[dict]) – Processing pipeline.
split (str) – Dataset split.
ann_file (str, optional) – The file contains data information. It will be loaded and parsed in the self.load_data_infos function.
test_mode (bool, optional) – Default to False.
bbox_min_size (int, optional) – Only bounding boxes whose sizes are larger than bbox_min_size can be regarded as valid. Default to 0.
only_eval_visible (bool, optional) – Whether to only evaluate frames where object are visible. Default to False.
file_client_args (dict, optional) – Arguments to instantiate a FileClient. Default: dict(backend=’disk’).

evaluate(results, metric=['track'], logger=None)[源代码]¶

Default evaluation standard is OPE.

参数

results (dict(list[ndarray])) – tracking results. The ndarray is in (x1, y1, x2, y2, score) format.
metric (list, optional) – defaults to [‘track’].
logger (logging.Logger | str | None, optional) – defaults to None.

get_ann_infos_from_video(video_ind)[源代码]¶

Get annotation information in a video.

参数

video_ind (int) – video index

返回

{‘bboxes’: ndarray in (N, 4) shape, ‘bboxes_isvalid’:: ndarray, ‘visible’:ndarray}. The annotation information in some datasets may contain ‘visible_ratio’. The bbox is in (x1, y1, x2, y2) format.

返回类型

dict

get_bboxes_from_video(video_ind)[源代码]¶

Get bboxes annotation about the instance in a video.

参数

video_ind (int) – video index

返回

in [N, 4] shape. The N is the number of bbox and the bbox: is in (x, y, w, h) format.

返回类型

ndarray

get_img_infos_from_video(video_ind)[源代码]¶

Get image information in a video.

参数: video_ind (int) – video index
返回: {‘filename’: list[str], ‘frame_ids’:ndarray, ‘video_id’:int}
返回类型: dict

get_len_per_video(video_ind)[源代码]¶: Get the number of frames in a video.

get_visibility_from_video(video_ind)[源代码]¶: Get the visible information of instance in a video.

load_as_video¶

The self.data_info is a list, which the length is the number of videos. The default content is in the following format: [

{
‘video_path’: the video path ‘ann_path’: the annotation path ‘start_frame_id’: the starting frame ID number contained in

the image name

‘end_frame_id’: the ending frame ID number contained in the
image name

‘framename_template’: the template of image name

]

pre_pipeline(results)[源代码]¶

Prepare results dict for pipeline.

The following keys in dict will be called in the subsequent pipeline.

prepare_test_data(video_ind, frame_ind)[源代码]¶

Get testing data of one frame. We parse one video, get one frame from it and pass the frame information to the pipeline.

参数

video_ind (int) – video index
frame_ind (int) – frame index

返回

testing data of one frame.

返回类型

dict

prepare_train_data(video_ind)[源代码]¶

Get training data sampled from some videos. We firstly sample two videos from the dataset and then parse the data information. The first operation in the training pipeline is frames sampling.

参数: video_ind (int) – video index
返回: training data pairs, triplets or groups.
返回类型: dict

class mmtrack.datasets.CocoVID(*args: Any, **kwargs: Any)[源代码]¶

Inherit official COCO class in order to parse the annotations of bbox- related video tasks.

参数

annotation_file (str) – location of annotation file. Defaults to None.
load_img_as_vid (bool) – If True, convert image data to video data, which means each image is converted to a video. Defaults to False.

convert_img_to_vid(dataset)[源代码]¶: Convert image data to video data.

createIndex()[源代码]¶: Create index.

get_img_ids_from_ins_id(insId)[源代码]¶

Get image ids from given instance id.

参数: insId (int) – The given instance id.
返回: Image ids of given instance id.
返回类型: list[int]

get_img_ids_from_vid(vidId)[源代码]¶

Get image ids from given video id.

参数: vidId (int) – The given video id.
返回: Image ids of given video id.
返回类型: list[int]

get_ins_ids_from_vid(vidId)[源代码]¶

Get instance ids from given video id.

参数: vidId (int) – The given video id.
返回: Instance ids of given video id.
返回类型: list[int]

get_vid_ids(vidIds=[])[源代码]¶

Get video ids that satisfy given filter conditions.

Default return all video ids.

参数: vidIds (list[int]) – The given video ids. Defaults to [].
返回: Video ids.
返回类型: list[int]

load_vids(ids=[])[源代码]¶

Get video information of given video ids.

Default return all videos information.

参数: ids (list[int]) – The given video ids. Defaults to [].
返回: List of video information.
返回类型: list[dict]

class mmtrack.datasets.CocoVideoDataset(load_as_video=True, key_img_sampler={'interval': 1}, ref_img_sampler={'filter_key_img': True, 'frame_range': 10, 'method': 'uniform', 'num_ref_imgs': 1, 'return_key_img': True, 'stride': 1}, test_load_ann=False, *args, **kwargs)[源代码]¶

Base coco video dataset for VID, MOT and SOT tasks.

参数

load_as_video (bool) – If True, using COCOVID class to load dataset, otherwise, using COCO class. Default: True.
key_img_sampler (dict) – Configuration of sampling key images.
ref_img_sampler (dict) – Configuration of sampling ref images.
test_load_ann (bool) – If True, loading annotations during testing, otherwise, not loading. Default: False.

evaluate(results, metric=['bbox', 'track'], logger=None, bbox_kwargs={'classwise': False, 'iou_thrs': None, 'metric_items': None, 'proposal_nums': (100, 300, 1000)}, track_kwargs={'ignore_by_classes': False, 'ignore_iof_thr': 0.5, 'iou_thr': 0.5, 'nproc': 4})[源代码]¶

Evaluation in COCO protocol and CLEAR MOT metric (e.g. MOTA, IDF1).

参数

results (dict) – Testing results of the dataset.
metric (str | list[str]) – Metrics to be evaluated. Options are ‘bbox’, ‘segm’, ‘track’.
logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.
bbox_kwargs (dict) – Configuration for COCO styple evaluation.
track_kwargs (dict) – Configuration for CLEAR MOT evaluation.

返回

COCO style and CLEAR MOT evaluation metric.

返回类型

dict[str, float]

get_ann_info(img_info)[源代码]¶

Get COCO annotations by the information of image.

参数: img_info (int) – Information of image.
返回: Annotation information of img_info.
返回类型: dict

key_img_sampling(img_ids, interval=1)[源代码]¶: Sampling key images.

load_annotations(ann_file)[源代码]¶

Load annotations from COCO/COCOVID style annotation file.

参数: ann_file (str) – Path of annotation file.
返回: Annotation information from COCO/COCOVID api.
返回类型: list[dict]

load_video_anns(ann_file)[源代码]¶

Load annotations from COCOVID style annotation file.

参数: ann_file (str) – Path of annotation file.
返回: Annotation information from COCOVID api.
返回类型: list[dict]

prepare_data(idx)[源代码]¶

Get data and annotations after pipeline.

参数: idx (int) – Index of data.
返回: Data and annotations after pipeline with new keys introduced by pipeline.
返回类型: dict

prepare_results(img_info)[源代码]¶: Prepare results for image (e.g. the annotation information, …).

prepare_test_img(idx)[源代码]¶

Get testing data after pipeline.

参数: idx (int) – Index of data.
返回: Testing data after pipeline with new keys intorduced by pipeline.
返回类型: dict

prepare_train_img(idx)[源代码]¶

Get training data and annotations after pipeline.

参数: idx (int) – Index of data.
返回: Training data and annotations after pipeline with new keys introduced by pipeline.
返回类型: dict

ref_img_sampling(img_info, frame_range, stride=1, num_ref_imgs=1, filter_key_img=True, method='uniform', return_key_img=True)[源代码]¶

Sampling reference frames in the same video for key frame.

参数

img_info (dict) – The information of key frame.
frame_range (List(int) | int) – The sampling range of reference frames in the same video for key frame.
stride (int) – The sampling frame stride when sampling reference images. Default: 1.
num_ref_imgs (int) – The number of sampled reference images. Default: 1.
filter_key_img (bool) – If False, the key image will be in the sampling reference candidates, otherwise, it is exclude. Default: True.
method (str) – The sampling method. Options are ‘uniform’, ‘bilateral_uniform’, ‘test_with_adaptive_stride’, ‘test_with_fix_stride’. ‘uniform’ denotes reference images are randomly sampled from the nearby frames of key frame. ‘bilateral_uniform’ denotes reference images are randomly sampled from the two sides of the nearby frames of key frame. ‘test_with_adaptive_stride’ is only used in testing, and denotes the sampling frame stride is equal to (video length / the number of reference images). test_with_fix_stride is only used in testing with sampling frame stride equalling to stride. Default: ‘uniform’.
return_key_img (bool) – If True, the information of key frame is returned, otherwise, not returned. Default: True.

返回

img_info and the reference images information or only the reference images information.

返回类型

list(dict)

class mmtrack.datasets.DanceTrackDataset(visibility_thr=- 1, interpolate_tracks_cfg=None, detection_file=None, *args, **kwargs)[源代码]¶

Dataset for DanceTrack: https://github.com/DanceTrack/DanceTrack.

Most content is inherited from MOTChallengeDataset.

get_benchmark_and_eval_split()[源代码]¶

Get benchmark and dataset split to evaluate.

Get benchmark from upeper/lower-case image prefix and the dataset split to evaluate.

返回: The first string denotes the type of dataset. The second string denots the split of the dataset to eval.
返回类型: tuple(string)

class mmtrack.datasets.GOT10kDataset(*args, **kwargs)[源代码]¶

GOT10k Dataset of single object tracking.

The dataset can both support training and testing mode.

format_results(results, resfile_path=None, logger=None)[源代码]¶

Format the results to txts (standard format for GOT10k Challenge).

参数

results (dict(list[ndarray])) – Testing results of the dataset.
resfile_path (str) – Path to save the formatted results. Defaults to None.
logger (logging.Logger | str | None, optional) – defaults to None.

get_visibility_from_video(video_ind)[源代码]¶: Get the visible information of instance in a video.

load_data_infos(split='train')[源代码]¶

Load dataset information.

参数

split (str, optional) – the split of dataset. Defaults to ‘train’.

返回

the length of the list is the number of videos. The

inner dict is in the following format:

{

‘video_path’: the video path ‘ann_path’: the annotation path ‘start_frame_id’: the starting frame number contained

in the image name

’end_frame_id’: the ending frame number contained in: the image name

’framename_template’: the template of image name

}

返回类型

list[dict]

prepare_test_data(video_ind, frame_ind)[源代码]¶

Get testing data of one frame. We parse one video, get one frame from it and pass the frame information to the pipeline.

参数

video_ind (int) – video index
frame_ind (int) – frame index

返回

testing data of one frame.

返回类型

dict

class mmtrack.datasets.ImagenetVIDDataset(*args, **kwargs)[源代码]¶

ImageNet VID dataset for video object detection.

load_annotations(ann_file)[源代码]¶

Load annotations from COCO/COCOVID style annotation file.

参数: ann_file (str) – Path of annotation file.
返回: Annotation information from COCO/COCOVID api.
返回类型: list[dict]

load_image_anns(ann_file)[源代码]¶

Load annotations from COCO style annotation file.

参数: ann_file (str) – Path of annotation file.
返回: Annotation information from COCO api.
返回类型: list[dict]

load_video_anns(ann_file)[源代码]¶

Load annotations from COCOVID style annotation file.

参数: ann_file (str) – Path of annotation file.
返回: Annotation information from COCOVID api.
返回类型: list[dict]

class mmtrack.datasets.LaSOTDataset(*args, **kwargs)[源代码]¶

LaSOT dataset of single object tracking.

The dataset can both support training and testing mode.

get_visibility_from_video(video_ind)[源代码]¶: Get the visible information of instance in a video.

load_data_infos(split='test')[源代码]¶

Load dataset information.

参数

split (str, optional) – Dataset split. Defaults to ‘test’.

返回

The length of the list is the number of videos. The

inner dict is in the following format:

{

‘video_path’: the video path ‘ann_path’: the annotation path ‘start_frame_id’: the starting frame number contained

in the image name

’end_frame_id’: the ending frame number contained in: the image name

’framename_template’: the template of image name

}

返回类型

list[dict]

class mmtrack.datasets.MOTChallengeDataset(visibility_thr=- 1, interpolate_tracks_cfg=None, detection_file=None, *args, **kwargs)[源代码]¶

Dataset for MOTChallenge.

参数

visibility_thr (float, optional) – The minimum visibility for the objects during training. Default to -1.
interpolate_tracks_cfg (dict, optional) –
If not None, Interpolate tracks linearly to make tracks more complete. Defaults to None. - min_num_frames (int, optional): The minimum length of a track

that will be interpolated. Defaults to 5.
- max_num_frames (int, optional): The maximum disconnected length
  in a track. Defaults to 20.
detection_file (str, optional) – The path of the public detection file. Default to None.

evaluate(results, metric='track', logger=None, resfile_path=None, bbox_iou_thr=0.5, track_iou_thr=0.5)[源代码]¶

Evaluation in MOT Challenge.

参数

results (list[list | tuple]) – Testing results of the dataset.
metric (str | list[str]) – Metrics to be evaluated. Options are ‘bbox’, ‘track’. Defaults to ‘track’.
logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.
resfile_path (str, optional) – Path to save the formatted results. Defaults to None.
bbox_iou_thr (float, optional) – IoU threshold for detection evaluation. Defaults to 0.5.
track_iou_thr (float, optional) – IoU threshold for tracking evaluation.. Defaults to 0.5.

返回

MOTChallenge style evaluation metric.

返回类型

dict[str, float]

format_bbox_results(results, infos, resfile)[源代码]¶: Format detection results.

format_results(results, resfile_path=None, metrics=['track'])[源代码]¶

Format the results to txts (standard format for MOT Challenge).

参数

results (dict(list[ndarray])) – Testing results of the dataset.
resfile_path (str, optional) – Path to save the formatted results. Defaults to None.
metrics (list[str], optional) – The results of the specific metrics will be formatted.. Defaults to [‘track’].

返回

(resfile_path, resfiles, names, tmp_dir), resfile_path is the path to save the formatted results, resfiles is a dict containing the filepaths, names is a list containing the name of the videos, tmp_dir is the temporal directory created for saving files.

返回类型

tuple

format_track_results(results, infos, resfile)[源代码]¶: Format tracking results.

get_benchmark_and_eval_split()[源代码]¶

Get benchmark and dataset split to evaluate.

Get benchmark from upeper/lower-case image prefix and the dataset split to evaluate.

返回: The first string denotes the type of dataset. The second string denotes the split of the dataset to eval.
返回类型: tuple(string)

get_dataset_cfg_for_hota(gt_folder, tracker_folder, seqmap)[源代码]¶

Get default configs for trackeval.datasets.MotChallenge2DBox.

参数

gt_folder (str) – the name of the GT folder
tracker_folder (str) – the name of the tracker folder
seqmap (str) – the file that contains the sequence of video names

返回

Dataset Configs for MotChallenge2DBox.

load_detections(detection_file=None)[源代码]¶: Load public detections.

prepare_results(img_info)[源代码]¶: Prepare results for image (e.g. the annotation information, …).

class mmtrack.datasets.OTB100Dataset(*args, **kwargs)[源代码]¶

OTB100 dataset of single object tracking.

The dataset is only used to test.

get_bboxes_from_video(video_ind)[源代码]¶

Get bboxes annotation about the instance in a video.

参数

video_ind (int) – video index

返回

in [N, 4] shape. The N is the bbox number and the bbox: is in (x, y, w, h) format.

返回类型

ndarray

load_data_infos(split='test')[源代码]¶

Load dataset information.

参数

split (str, optional) – Dataset split. Defaults to ‘test’.

返回

The length of the list is the number of videos. The

inner dict is in the following format:

{

‘video_path’: the video path ‘ann_path’: the annotation path ‘start_frame_id’: the starting frame number contained

in the image name

’end_frame_id’: the ending frame number contained in: the image name

’framename_template’: the template of image name ‘init_skip_num’: (optional) the number of skipped

frames when initializing tracker

}

返回类型

list[dict]

class mmtrack.datasets.RandomSampleConcatDataset(dataset_cfgs, dataset_sampling_weights=None)[源代码]¶

A wrapper of concatenated dataset. Support randomly sampling one dataset from concatenated datasets and then getting samples from the sampled dataset.

参数

dataset_cfgs (list[dict]) – The list contains all configs of concatenated datasets.
dataset_sampling_weights (list[float]) – The list contains the sampling weights of each dataset.

class mmtrack.datasets.ReIDDataset(pipeline, triplet_sampler=None, *args, **kwargs)[源代码]¶

Dataset for ReID Dataset.

参数

pipeline (list) – a list of dict, where each element represents a operation defined in mmtrack.datasets.pipelines
triplet_sampler (dict) – The sampler for hard mining triplet loss.

evaluate(results, metric='mAP', metric_options=None, logger=None)[源代码]¶

Evaluate the ReID dataset.

参数

results (list) – Testing results of the dataset.
metric (str | list[str]) – Metrics to be evaluated. Default value is mAP.
metric_options – (dict, optional): Options for calculating metrics. Allowed keys are ‘rank_list’ and ‘max_rank’. Defaults to None.
logger (logging.Logger | str, optional) – Logger used for printing related information during evaluation. Defaults to None.

返回

evaluation results

返回类型

dict

load_annotations()[源代码]¶

Load annotations from ImageNet style annotation file.

返回: Annotation information from ReID api.
返回类型: list[dict]

prepare_data(idx)[源代码]¶: Prepare results for image (e.g. the annotation information, …).

triplet_sampling(pos_pid, num_ids=8, ins_per_id=4)[源代码]¶

Triplet sampler for hard mining triplet loss. First, for one pos_pid, random sample ins_per_id images with same person id.

Then, random sample num_ids - 1 negative ids. Finally, random sample ins_per_id images for each negative id.

参数

pos_pid (ndarray) – The person id of the anchor.
num_ids (int) – The number of person ids.
ins_per_id (int) – The number of image for each person.

返回

Annotation information of num_ids X ins_per_id images.

返回类型

List

class mmtrack.datasets.SOTCocoDataset(ann_file, *args, **kwargs)[源代码]¶

Coco dataset of single object tracking.

The dataset only support training mode.

get_bboxes_from_video(video_ind)[源代码]¶

Get bbox annotation about the instance in an image.

参数: video_ind (int) – video index. Each video_ind denotes an instance.
返回: in [1, 4] shape. The bbox is in (x, y, w, h) format.
返回类型: ndarray

get_img_infos_from_video(video_ind)[源代码]¶

Get all frame paths in a video.

参数: video_ind (int) – video index. Each video_ind denotes an instance.
返回: all image paths
返回类型: list[str]

get_len_per_video(video_ind)[源代码]¶: Get the number of frames in a video.

load_data_infos(split='train')[源代码]¶

Load dataset information. Each instance is viewed as a video.

参数

split (str, optional) – The split of dataset. Defaults to ‘train’.

返回

The length of the list is the number of valid object: annotations. The elemment in the list is annotation ID in coco API.

返回类型

list[int]

class mmtrack.datasets.SOTImageNetVIDDataset(ann_file, *args, **kwargs)[源代码]¶

ImageNet VID dataset of single object tracking.

The dataset only support training mode.

get_ann_infos_from_video(video_ind)[源代码]¶

Get annotation information in a video. Note: We overload this function for speed up loading video information.

参数

video_ind (int) – video index. Each video_ind denotes an instance.

返回

{‘bboxes’: ndarray in (N, 4) shape, ‘bboxes_isvalid’:: ndarray, ‘visible’:ndarray}. The bbox is in (x1, y1, x2, y2) format.

返回类型

dict

get_bboxes_from_video(video_ind)[源代码]¶

Get bbox annotation about the instance in a video. Considering get_bboxes_from_video in SOTBaseDataset is not compatible with SOTImageNetVIDDataset, we oveload this function though it’s not called by self.get_ann_infos_from_video.

参数: video_ind (int) – video index. Each video_ind denotes an instance.
返回: in [N, 4] shape. The bbox is in (x, y, w, h) format.
返回类型: ndarray

get_img_infos_from_video(video_ind)[源代码]¶

Get image information in a video.

参数: video_ind (int) – video index
返回: {‘filename’: list[str], ‘frame_ids’:ndarray, ‘video_id’:int}
返回类型: dict

get_len_per_video(video_ind)[源代码]¶: Get the number of frames in a video.

get_visibility_from_video(video_ind)[源代码]¶

Get the visible information in a video.

Considering get_visibility_from_video in SOTBaseDataset is not compatible with SOTImageNetVIDDataset, we oveload this function though it’s not called by self.get_ann_infos_from_video.

load_data_infos(split='train')[源代码]¶

Load dataset information.

参数

split (str, optional) – The split of dataset. Defaults to ‘train’.

返回

The length of the list is the number of instances. The: elemment in the list is instance ID in coco API.

返回类型

list[int]

class mmtrack.datasets.SOTTestDataset(*args, **kwargs)[源代码]¶

Dataset for the testing of single object tracking.

The dataset doesn’t support training mode.

evaluate(results, metric=['track'], logger=None)[源代码]¶

Evaluation in OPE protocol.

参数

results (dict) – Testing results of the dataset.
metric (str | list[str]) – Metrics to be evaluated. Options are ‘track’.
logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.

返回

OPE style evaluation metric (i.e. success, norm precision and precision).

返回类型

dict[str, float]

class mmtrack.datasets.SOTTrainDataset(*args, **kwargs)[源代码]¶

Dataset for the training of single object tracking.

The dataset doesn’t support testing mode.

get_snippet_of_instance(idx)[源代码]¶

Get a snippet of an instance in a video.

参数: idx (int) – Index of data.
返回: (snippet, image_id, instance_id), snippet is a list containing the successive image ids where the instance appears, image_id is a random sampled image id from the snippet.
返回类型: tuple

load_video_anns(ann_file)[源代码]¶

Load annotations from COCOVID style annotation file.

参数: ann_file (str) – Path of annotation file.
返回: Annotation information from COCOVID api.
返回类型: list[dict]

prepare_results(img_id, instance_id, is_positive_pair)[源代码]¶

Get training data and annotations.

参数

img_id (int) – The id of image.
instance_id (int) – The id of instance.
is_positive_pair (bool) – denoting positive or negative sample pair.

返回

The information of training image and annotation.

返回类型

dict

prepare_train_img(idx)[源代码]¶

Get training data and annotations after pipeline.

参数: idx (int) – Index of data.
返回: Training data and annotation after pipeline with new keys introduced by pipeline.
返回类型: dict

ref_img_sampling(snippet, image_id, instance_id, frame_range=5, pos_prob=0.8, filter_key_img=False, return_key_img=True, **kwargs)[源代码]¶

Get a search image for an instance in an exemplar image.

If sampling a positive search image, the positive search image is randomly sampled from the exemplar image, where the sampled range is decided by frame_range. If sampling a negative search image, the negative search image and negative instance are randomly sampled from the entire dataset.

参数

snippet (list[int]) – The successive image ids where the instance appears.
image_id (int) – The id of exemplar image where the instance appears.
instance_id (int) – The id of the instance.
frame_range (List(int) | int) – The frame range of sampling a positive search image for the exemplar image. Default: 5.
pos_prob (float) – The probability of sampling a positive search image. Default: 0.8.
filter_key_img (bool) – If False, the exemplar image will be in the sampling candidates, otherwise, it is exclude. Default: False.
return_key_img (bool) – If True, the image_id and instance_id are returned, otherwise, not returned. Default: True.

返回

(image_ids, instance_ids, is_positive_pair), image_ids is a list that must contain search image id and may contain image_id, instance_ids is a list that must contain search instance id and may contain instance_id, is_positive_pair is a bool denoting positive or negative sample pair.

返回类型

tuple

class mmtrack.datasets.TaoDataset(*args, **kwargs)[源代码]¶

Dataset for TAO.

evaluate(results, metric=['bbox', 'track'], logger=None, resfile_path=None)[源代码]¶

Evaluation in COCO protocol and CLEAR MOT metric (e.g. MOTA, IDF1).

参数

results (dict) – Testing results of the dataset.
metric (str | list[str]) – Metrics to be evaluated. Options are ‘bbox’, ‘segm’, ‘track’.
logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.
bbox_kwargs (dict) – Configuration for COCO styple evaluation.
track_kwargs (dict) – Configuration for CLEAR MOT evaluation.

返回

COCO style and CLEAR MOT evaluation metric.

返回类型

dict[str, float]

format_results(results, resfile_path=None)[源代码]¶

Format the results to json (standard format for TAO evaluation).

参数

results (list[ndarray]) – Testing results of the dataset.
resfile_path (str, optional) – Path to save the formatted results. Defaults to None.

返回

(result_files, tmp_dir), result_files is a dict containing the json filepaths, tmp_dir is the temporal directory created for saving json files when resfile_path is not specified.

返回类型

tuple

load_annotations(ann_file)[源代码]¶: Load annotation from annotation file.

load_lvis_anns(ann_file)[源代码]¶

Load annotation from COCO style annotation file.

参数: ann_file (str) – Path of annotation file.
返回: Annotation info from COCO api.
返回类型: list[dict]

load_tao_anns(ann_file)[源代码]¶

Load annotation from COCOVID style annotation file.

参数: ann_file (str) – Path of annotation file.
返回: Annotation info from COCOVID api.
返回类型: list[dict]

class mmtrack.datasets.TrackingNetDataset(chunks_list=['all'], *args, **kwargs)[源代码]¶

TrackingNet dataset of single object tracking.

The dataset can both support training and testing mode.

format_results(results, resfile_path=None, logger=None)[源代码]¶

Format the results to txts (standard format for TrackingNet Challenge).

参数

results (dict(list[ndarray])) – Testing results of the dataset.
resfile_path (str) – Path to save the formatted results. Defaults to None.
logger (logging.Logger | str | None, optional) – defaults to None.

load_data_infos(split='train')[源代码]¶

Load dataset information.

参数

split (str, optional) – the split of dataset. Defaults to ‘train’.

返回

the length of the list is the number of videos. The

inner dict is in the following format:

{

‘video_path’: the video path ‘ann_path’: the annotation path ‘start_frame_id’: the starting frame ID number

contained in the image name

’end_frame_id’: the ending frame ID number contained in: the image name

’framename_template’: the template of image name

}

返回类型

list[dict]

prepare_test_data(video_ind, frame_ind)[源代码]¶

Get testing data of one frame. We parse one video, get one frame from it and pass the frame information to the pipeline.

参数

video_ind (int) – video index
frame_ind (int) – frame index

返回

testing data of one frame.

返回类型

dict

class mmtrack.datasets.UAV123Dataset(*args, **kwargs)[源代码]¶

UAV123 dataset of single object tracking.

The dataset is only used to test.

load_data_infos(split='test')[源代码]¶

Load dataset information.

参数

split (str, optional) – Dataset split. Defaults to ‘test’.

返回

The length of the list is the number of videos. The

inner dict is in the following format:

{

‘video_path’: the video path ‘ann_path’: the annotation path ‘start_frame_id’: the starting frame number contained

in the image name

’end_frame_id’: the ending frame number contained in: the image name

’framename_template’: the template of image name

}

返回类型

list[dict]

class mmtrack.datasets.VOTDataset(dataset_type='vot2018', *args, **kwargs)[源代码]¶

VOT dataset of single object tracking.

The dataset is only used to test.

evaluate(results, metric=['track'], logger=None, interval=None)[源代码]¶

Evaluation in VOT protocol.

参数

results (dict) – Testing results of the dataset. The tracking bboxes are in (tl_x, tl_y, br_x, br_y) format.
metric (str | list[str]) – Metrics to be evaluated. Options are ‘track’.
logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.
interval (list) – an specified interval in EAO curve used to calculate the EAO score. There are different settings in different VOT challenges.

返回

返回类型

dict[str, float]

get_ann_infos_from_video(video_ind)[源代码]¶

Get bboxes annotation about the instance in a video.

参数

video_ind (int) – video index

返回

in [N, 8] shape. The N is the bbox number and the bbox: is in (x1, y1, x2, y2, x3, y3, x4, y4) format.

返回类型

ndarray

load_data_infos(split='test')[源代码]¶

Load dataset information.

参数

split (str, optional) – Dataset split. Defaults to ‘test’.

返回

The length of the list is the number of videos. The

inner dict is in the following format:

{

‘video_path’: the video path ‘ann_path’: the annotation path ‘start_frame_id’: the starting frame number contained

in the image name

’end_frame_id’: the ending frame number contained in: the image name

’framename_template’: the template of image name

}

返回类型

list[dict]

class mmtrack.datasets.YouTubeVISDataset(dataset_version, *args, **kwargs)[源代码]¶

YouTube VIS dataset for video instance segmentation.

convert_back_to_vis_format()[源代码]¶

Convert the annotation back to the format of YouTube-VIS. The main difference between the two is the format of ‘annotation’. Before modification, it is recorded in the unit of images, and after modification, it is recorded in the unit of instances.This operation is to make it easier to use the official eval API.

返回

A dict with 3 keys, categories, annotations: and videos.

categories (list[dict]): Each dict has 2 keys, id and name.
videos (list[dict]): Each dict has 4 keys of video info, id, name, width and height.
annotations (list[dict]): Each dict has 7 keys of video info, category_id, segmentations, bboxes, video_id, areas, id and iscrowd.

返回类型

dict

evaluate(results, metric=['track_segm'], logger=None)[源代码]¶

Evaluation in COCO protocol.

参数

results (dict) – Testing results of the dataset.
metric (str | list[str]) – Metrics to be evaluated. Options are ‘track_segm’.
logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.

返回

COCO style evaluation metric.

返回类型

dict[str, float]

format_results(results, resfile_path=None, metrics=['track_segm'], save_as_json=True)[源代码]¶

Format the results to a zip file (standard format for YouTube-VIS Challenge).

参数

results (dict(list[ndarray])) – Testing results of the dataset.
resfile_path (str, optional) – Path to save the formatted results. Defaults to None.
metrics (list[str], optional) – The results of the specific metrics will be formatted. Defaults to [‘track_segm’].
save_as_json (bool, optional) – Whether to save the json results file. Defaults to True.

返回

(resfiles, tmp_dir), resfiles is the path of the result json file, tmp_dir is the temporal directory created for saving files.

返回类型

tuple

mmtrack.datasets.build_dataloader(dataset, samples_per_gpu, workers_per_gpu, num_gpus=1, samples_per_epoch=None, dist=True, shuffle=True, seed=None, persistent_workers=False, **kwargs)[源代码]¶

Build PyTorch DataLoader.

In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.

参数

dataset (Dataset) – A PyTorch dataset.
samples_per_gpu (int) – Number of training samples on each GPU, i.e., batch size of each GPU.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
num_gpus (int) – Number of GPUs. Only used in non-distributed training.
samples_per_epoch (int | None, Optional) – The number of samples per epoch. If equal to -1, using all samples in the datasets per epoch. Otherwise, using the samples_per_epoch samples. Default: None.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
seed (int, Optional) – Seed to be used. Default: None.
persistent_workers (bool) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. This argument is only valid when PyTorch>=1.7.0. Default: False.
kwargs – any keyword argument to be used to initialize DataLoader

返回

A PyTorch dataloader.

返回类型

DataLoader

parsers¶

class mmtrack.datasets.parsers.CocoVID(*args: Any, **kwargs: Any)[源代码]¶

Inherit official COCO class in order to parse the annotations of bbox- related video tasks.

参数

annotation_file (str) – location of annotation file. Defaults to None.
load_img_as_vid (bool) – If True, convert image data to video data, which means each image is converted to a video. Defaults to False.

convert_img_to_vid(dataset)[源代码]¶: Convert image data to video data.

createIndex()[源代码]¶: Create index.

get_img_ids_from_ins_id(insId)[源代码]¶

Get image ids from given instance id.

参数: insId (int) – The given instance id.
返回: Image ids of given instance id.
返回类型: list[int]

get_img_ids_from_vid(vidId)[源代码]¶

Get image ids from given video id.

参数: vidId (int) – The given video id.
返回: Image ids of given video id.
返回类型: list[int]

get_ins_ids_from_vid(vidId)[源代码]¶

Get instance ids from given video id.

参数: vidId (int) – The given video id.
返回: Instance ids of given video id.
返回类型: list[int]

get_vid_ids(vidIds=[])[源代码]¶

Get video ids that satisfy given filter conditions.

Default return all video ids.

参数: vidIds (list[int]) – The given video ids. Defaults to [].
返回: Video ids.
返回类型: list[int]

load_vids(ids=[])[源代码]¶

Get video information of given video ids.

Default return all videos information.

参数: ids (list[int]) – The given video ids. Defaults to [].
返回: List of video information.
返回类型: list[dict]

pipelines¶

class mmtrack.datasets.pipelines.CheckPadMaskValidity(stride)[源代码]¶

Check the validity of data. Generally, it’s used in such case: The image padding masks generated in the image preprocess need to be downsampled, and then passed into Transformer model, like DETR. The computation in the subsequent Transformer model must make sure that the values of downsampled mask are not all zeros.

参数: stride (int) – the max stride of feature map.

class mmtrack.datasets.pipelines.ConcatSameTypeFrames(num_key_frames=1)[源代码]¶

Concat the frames of the same type. We divide all the frames into two types: ‘key’ frames and ‘reference’ frames.

The input list contains as least two dicts. We concat the first num_key_frames dicts to one dict, and the rest of dicts are concated to another dict.

In SOT field, ‘key’ denotes template image and ‘reference’ denotes search image.

参数: num_key_frames (int, optional) – the number of key frames. Defaults to 1.

concat_one_mode_results(results)[源代码]¶: Concatenate the results of the same mode.

class mmtrack.datasets.pipelines.ConcatVideoReferences[源代码]¶

Concat video references.

If the input list contains at least two dicts, concat the input list of dict to one dict from 2-nd dict of the input list.

Note: the ‘ConcatVideoReferences’ class will be deprecated in the future, please use ‘ConcatSameTypeFrames’ instead.

class mmtrack.datasets.pipelines.LoadDetections[源代码]¶

Load public detections from MOT benchmark.

参数: results (dict) – Result dict from mmtrack.CocoVideoDataset.

class mmtrack.datasets.pipelines.LoadMultiImagesFromFile(*args, **kwargs)[源代码]¶

Load multi images from file.

Please refer to mmdet.datasets.pipelines.loading.py:LoadImageFromFile for detailed docstring.

class mmtrack.datasets.pipelines.MatchInstances(skip_nomatch=True)[源代码]¶

Matching objects on a pair of images.

参数

skip_nomatch (bool, optional) – Whether skip the pair of image
training when there are no matched objects. Default (during) –
True. (to) –

class mmtrack.datasets.pipelines.PairSampling(frame_range=5, pos_prob=0.8, filter_template_img=False)[源代码]¶

Pair-style sampling. It’s used in `SiameseRPN++

<https://arxiv.org/abs/1812.11703.>`_.

参数

frame_range (List(int) | int) – the sampling range of search frames in the same video for template frame. Defaults to 5.
pos_prob (float, optional) – the probility of sampling positive sample pairs. Defaults to 0.8.
filter_template_img (bool, optional) – if False, the template image will be in the sampling search candidates, otherwise, it is exclude. Defaults to False.

prepare_data(video_info, sampled_inds, is_positive_pairs=False)[源代码]¶

Prepare sampled training data according to the sampled index.

参数

video_info (dict) – the video information. It contains the keys: [‘bboxes’,’bboxes_isvalid’,’filename’,’frame_ids’, ‘video_id’,’visible’].
sampled_inds (list[int]) – the sampled frame indexes.
is_positive_pairs (bool, optional) – whether it’s the positive pairs. Defaults to False.

返回

contains the information of sampled data.

返回类型

List[dict]

class mmtrack.datasets.pipelines.ReIDFormatBundle(*args, **kwargs)[源代码]¶

ReID formatting bundle.

It first concatenates common fields, then simplifies the pipeline of formatting common fields, including “img”, and “gt_label”. These fields are formatted as follows.

img: (1) transpose, (2) to tensor, (3) to DataContainer (stack=True)
gt_labels: (1) to tensor, (2) to DataContainer

reid_format_bundle(results)[源代码]¶

Transform and format gt_label fields in results.

参数: results (dict) – Result dict contains the data to convert.
返回: The result dict contains the data that is formatted with ReID bundle.
返回类型: dict

class mmtrack.datasets.pipelines.SeqBboxJitter(scale_jitter_factor, center_jitter_factor, crop_size_factor)[源代码]¶

Bounding box jitter augmentation. The jittered bboxes are used for subsequent image cropping, like SeqCropLikeStark.

参数

scale_jitter_factor (list[int | float]) – contains the factor of scale jitter.
center_jitter_factor (list[int | float]) – contains the factor of center jitter.
crop_size_factor (list[int | float]) – contains the ratio of crop size to bbox size.

class mmtrack.datasets.pipelines.SeqBlurAug(prob=[0.0, 0.2])[源代码]¶

Blur augmention for images.

参数: prob (list[float]) – The probability to perform blur augmention for each image. Defaults to [0.0, 0.2].

class mmtrack.datasets.pipelines.SeqBrightnessAug(jitter_range=0)[源代码]¶

Brightness augmention for images.

参数: jitter_range (float) – The range of brightness jitter. Defaults to 0..

class mmtrack.datasets.pipelines.SeqColorAug(prob=[1.0, 1.0], rgb_var=[[- 0.55919361, 0.98062831, - 0.41940627], [1.72091413, 0.19879334, - 1.82968581], [4.64467907, 4.73710203, 4.88324118]])[源代码]¶

Color augmention for images.

参数

prob (list[float]) – The probability to perform color augmention for each image. Defaults to [1.0, 1.0].
rgb_var (list[list]]) – The values of color augmentaion. Defaults to [[-0.55919361, 0.98062831, -0.41940627], [1.72091413, 0.19879334, -1.82968581], [4.64467907, 4.73710203, 4.88324118]].

class mmtrack.datasets.pipelines.SeqCropLikeSiamFC(context_amount=0.5, exemplar_size=127, crop_size=511)[源代码]¶

Crop images as SiamFC did.

The way of cropping an image is proposed in “Fully-Convolutional Siamese Networks for Object Tracking.” SiamFC.

参数

context_amount (float) – The context amount around a bounding box. Defaults to 0.5.
exemplar_size (int) – Exemplar size. Defaults to 127.
crop_size (int) – Crop size. Defaults to 511.

crop_like_SiamFC(image, bbox, context_amount=0.5, exemplar_size=127, crop_size=511)[源代码]¶

Crop an image as SiamFC did.

参数

image (ndarray) – of shape (H, W, 3).
bbox (ndarray) – of shape (4, ) in [x1, y1, x2, y2] format.
context_amount (float) – The context amount around a bounding box. Defaults to 0.5.
exemplar_size (int) – Exemplar size. Defaults to 127.
crop_size (int) – Crop size. Defaults to 511.

返回

The cropped image of shape (crop_size, crop_size, 3).

返回类型

ndarray

generate_box(image, gt_bbox, context_amount, exemplar_size)[源代码]¶

Generate box based on cropped image.

参数

image (ndarray) – The cropped image of shape (self.crop_size, self.crop_size, 3).
gt_bbox (ndarray) – of shape (4, ) in [x1, y1, x2, y2] format.
context_amount (float) – The context amount around a bounding box.
exemplar_size (int) – Exemplar size. Defaults to 127.

返回

Generated box of shape (4, ) in [x1, y1, x2, y2] format.

返回类型

ndarray

class mmtrack.datasets.pipelines.SeqCropLikeStark(crop_size_factor, output_size)[源代码]¶

Crop images as Stark did.

The way of cropping an image is proposed in “Learning Spatio-Temporal Transformer for Visual Tracking.” Stark.

参数

crop_size_factor (list[int | float]) – contains the ratio of crop size to bbox size.
output_size (list[int | float]) – contains the size of resized image (always square).

crop_like_stark(img, bbox, crop_size_factor, output_size)[源代码]¶

Crop an image as Stark did.

参数

image (ndarray) – of shape (H, W, 3).
bbox (ndarray) – of shape (4, ) in [x1, y1, x2, y2] format.
crop_size_factor (float) – the ratio of crop size to bbox size
output_size (int) – the size of resized image (always square).

返回

the cropped image of shape: (crop_size, crop_size, 3).
resize_factor (float): the ratio of original image scale to cropped: image scale.

pdding_mask (ndarray): the padding mask caused by cropping.

返回类型

img_crop_padded (ndarray)

generate_box(bbox_gt, bbox_cropped, resize_factor, output_size, normalize=False)[源代码]¶

Transform the box coordinates from the original image coordinates to the coordinates of the cropped image.

参数

bbox_gt (ndarray) – of shape (4, ) in [x1, y1, x2, y2] format.
bbox_cropped (ndarray) – of shape (4, ) in [x1, y1, x2, y2] format.
resize_factor (float) – the ratio of original image scale to cropped image scale.
output_size (float) – the size of output image.
normalize (bool) – whether to normalize the output box. Default to True.

返回

generated box of shape (4, ) in [x1, y1, x2, y2] format.

返回类型

ndarray

class mmtrack.datasets.pipelines.SeqDefaultFormatBundle(ref_prefix='ref')[源代码]¶

Sequence Default formatting bundle.

It simplifies the pipeline of formatting common fields, including “img”, “img_metas”, “proposals”, “gt_bboxes”, “gt_instance_ids”, “gt_match_indices”, “gt_bboxes_ignore”, “gt_labels”, “gt_masks”, “gt_semantic_seg” and ‘padding_mask’. These fields are formatted as follows.

img: (1) transpose, (2) to tensor, (3) to DataContainer (stack=True)
img_metas: (1) to DataContainer (cpu_only=True)
proposals: (1) to tensor, (2) to DataContainer
gt_bboxes: (1) to tensor, (2) to DataContainer
gt_instance_ids: (1) to tensor, (2) to DataContainer
gt_match_indices: (1) to tensor, (2) to DataContainer
gt_bboxes_ignore: (1) to tensor, (2) to DataContainer
gt_labels: (1) to tensor, (2) to DataContainer
gt_masks: (1) to DataContainer (cpu_only=True)
gt_semantic_seg: (1) unsqueeze dim-0 (2) to tensor, (3) to DataContainer (stack=True)
padding_mask: (1) to tensor, (2) to DataContainer

参数: ref_prefix (str) – The prefix of key added to the second dict of input list. Defaults to ‘ref’.

default_format_bundle(results)[源代码]¶

Transform and format common fields in results.

参数: results (dict) – Result dict contains the data to convert.
返回: The result dict contains the data that is formatted with default bundle.
返回类型: dict

class mmtrack.datasets.pipelines.SeqGrayAug(prob=0.0)[源代码]¶

Gray augmention for images.

参数: prob (float) – The probability to perform gray augmention. Defaults to 0..

class mmtrack.datasets.pipelines.SeqLoadAnnotations(with_track=False, *args, **kwargs)[源代码]¶

Sequence load annotations.

Please refer to mmdet.datasets.pipelines.loading.py:LoadAnnotations for detailed docstring.

参数: with_track (bool) – If True, load instance ids of bboxes.

class mmtrack.datasets.pipelines.SeqNormalize(*args, **kwargs)[源代码]¶

Normalize images.

Please refer to mmdet.datasets.pipelines.transforms.py:Normalize for detailed docstring.

class mmtrack.datasets.pipelines.SeqPad(*args, **kwargs)[源代码]¶

Pad images.

Please refer to mmdet.datasets.pipelines.transforms.py:Pad for detailed docstring.

class mmtrack.datasets.pipelines.SeqPhotoMetricDistortion(share_params=True, brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[源代码]¶

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.

random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
randomly swap channels

参数

brightness_delta (int) – delta of brightness.
contrast_range (tuple) – range of contrast.
saturation_range (tuple) – range of saturation.
hue_delta (int) – delta of hue.

get_params()[源代码]¶: Generate parameters.

photo_metric_distortion(results, params=None)[源代码]¶

Call function to perform photometric distortion on images.

参数

results (dict) – Result dict from loading pipeline.
params (dict, optional) – Pre-defined parameters. Default to None.

返回

Result dict with images distorted.

返回类型

dict

class mmtrack.datasets.pipelines.SeqRandomCrop(crop_size, allow_negative_crop=False, share_params=False, bbox_clip_border=False)[源代码]¶

Sequentially random crop the images & bboxes & masks.

The absolute crop_size is sampled based on crop_type and image_size, then the cropped results are generated.

参数

crop_size (tuple) – The relative ratio or absolute pixels of height and width.
allow_negative_crop (bool, optional) – Whether to allow a crop that does not contain any bbox area. Default False.
share_params (bool, optional) – Whether share the cropping parameters for the images.
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.

注解

If the image is smaller than the absolute crop size, return the
original image.
The keys for bboxes, labels and masks must be aligned. That is, gt_bboxes corresponds to gt_labels and gt_masks, and gt_bboxes_ignore corresponds to gt_labels_ignore and gt_masks_ignore.
If the crop does not contain any gt-bbox region and allow_negative_crop is set to False, skip this image.

get_offsets(img)[源代码]¶: Random generate the offsets for cropping.

random_crop(results, offsets=None)[源代码]¶

Call function to randomly crop images, bounding boxes, masks, semantic segmentation maps.

参数

results (dict) – Result dict from loading pipeline.
offsets (tuple, optional) – Pre-defined offsets for cropping. Default to None.

返回

Randomly cropped results, ‘img_shape’ key in result dict is updated according to crop size.

返回类型

dict

class mmtrack.datasets.pipelines.SeqRandomFlip(share_params, *args, **kwargs)[源代码]¶

Randomly flip for images.

Please refer to mmdet.datasets.pipelines.transforms.py:RandomFlip for detailed docstring.

参数: share_params (bool) – If True, share the flip parameters for all images. Defaults to True.

class mmtrack.datasets.pipelines.SeqResize(share_params=True, *args, **kwargs)[源代码]¶

Resize images.

Please refer to mmdet.datasets.pipelines.transforms.py:Resize for detailed docstring.

参数: share_params (bool) – If True, share the resize parameters for all images. Defaults to True.

class mmtrack.datasets.pipelines.SeqShiftScaleAug(target_size=[127, 255], shift=[4, 64], scale=[0.05, 0.18])[源代码]¶

Shift and rescale images and bounding boxes.

参数

target_size (list[int]) – list of int denoting exemplar size and search size, respectively. Defaults to [127, 255].
shift (list[int]) – list of int denoting the max shift offset. Defaults to [4, 64].
scale (list[float]) – list of float denoting the max rescale factor. Defaults to [0.05, 0.18].

class mmtrack.datasets.pipelines.ToList[源代码]¶

Use list to warp each value of the input dict.

参数: results (dict) – Result dict contains the data to convert.
返回: Updated result dict contains the data to convert.
返回类型: dict

class mmtrack.datasets.pipelines.TridentSampling(num_search_frames=1, num_template_frames=2, max_frame_range=[200], cls_pos_prob=0.5, train_cls_head=False, min_num_frames=20)[源代码]¶

Multitemplate-style sampling in a trident manner. It’s firstly used in STARK.

参数

num_search_frames (int, optional) – the number of search frames
num_template_frames (int, optional) – the number of template frames
max_frame_range (list[int], optional) – the max frame range of sampling a positive search image for the template image. Its length is equal to the number of extra templates, i.e., num_template_frames-1. Default length is 1.
cls_pos_prob (float, optional) – the probility of sampling positive samples in classification training.
train_cls_head (bool, optional) – whether to train classification head.
min_num_frames (int, optional) – the min number of frames to be sampled.

prepare_cls_data(video_info, video_info_another, sampled_inds)[源代码]¶

Prepare the sampled classification training data according to the sampled index.

参数

video_info (dict) – the video information. It contains the keys: [‘bboxes’,’bboxes_isvalid’,’filename’,’frame_ids’, ‘video_id’,’visible’].
video_info_another (dict) – the another video information. It’s only used to get negative samples in classification train. It contains the keys: [‘bboxes’,’bboxes_isvalid’,’filename’, ‘frame_ids’,’video_id’,’visible’].
sampled_inds (list[int]) – the sampled frame indexes.

返回

contains the information of sampled data.

返回类型

List[dict]

prepare_data(video_info, sampled_inds, with_label=False)[源代码]¶

Prepare sampled training data according to the sampled index.

参数

video_info (dict) – the video information. It contains the keys: [‘bboxes’,’bboxes_isvalid’,’filename’,’frame_ids’, ‘video_id’,’visible’].
sampled_inds (list[int]) – the sampled frame indexes.
with_label (bool, optional) – whether to recode labels in ann infos. Only set True in classification training. Defaults to False.

返回

contains the information of sampled data.

返回类型

List[dict]

random_sample_inds(video_visibility, num_samples=1, frame_range=None, allow_invisible=False, force_invisible=False)[源代码]¶

Random sampling a specific number of samples from the specified frame range of the video. It also considers the visibility of each frame.

参数

video_visibility (ndarray) – the visibility of each frame in the video.
num_samples (int, optional) – the number of samples. Defaults to 1.
frame_range (list | None, optional) – the frame range of sampling. Defaults to None.
allow_invisible (bool, optional) – whether to allow to get invisible samples. Defaults to False.
force_invisible (bool, optional) – whether to force to get invisible samples. Defaults to False.

返回

The sampled frame indexes.

返回类型

List

sampling_trident(video_visibility)[源代码]¶

Sampling multiple template images and one search images in one video.

参数: video_visibility (ndarray) – the visibility of each frame in the video.
返回: the indexes of template and search images.
返回类型: List

class mmtrack.datasets.pipelines.VideoCollect(keys, meta_keys=None, default_meta_keys=('filename', 'ori_filename', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'img_norm_cfg', 'frame_id', 'is_video_data'))[源代码]¶

Collect data from the loader relevant to the specific task.

参数

keys (Sequence[str]) – Keys of results to be collected in data.
meta_keys (Sequence[str]) – Meta keys to be converted to mmcv.DataContainer and collected in data[img_metas]. Defaults to None.
default_meta_keys (tuple) – Default meta keys. Defaults to (‘filename’, ‘ori_filename’, ‘ori_shape’, ‘img_shape’, ‘pad_shape’, ‘scale_factor’, ‘flip’, ‘flip_direction’, ‘img_norm_cfg’, ‘frame_id’, ‘is_video_data’).

samplers¶

class mmtrack.datasets.samplers.DistributedQuotaSampler(dataset, samples_per_epoch, num_replicas=None, rank=None, replacement=False, seed=0)[源代码]¶

Sampler that gets fixed number of samples per epoch.

It is especially useful in conjunction with torch.nn.parallel.DistributedDataParallel. In such case, each process can pass a DistributedSampler instance as a DataLoader sampler, and load a subset of the original dataset that is exclusive to it.

注解

Dataset is assumed to be of constant size.

参数

dataset – Dataset used for sampling.
samples_per_epoch (int) – The number of samples per epoch.
num_replicas (optional) – Number of processes participating in distributed training.
rank (optional) – Rank of the current process within num_replicas.
replacement (bool) – samples are drawn with replacement if True, Default: False.
seed (int, optional) – random seed used to shuffle the sampler if shuffle=True. This number should be identical across all processes in the distributed group. Default: 0.

class mmtrack.datasets.samplers.DistributedVideoSampler(dataset, num_replicas=None, rank=None, shuffle=False)[源代码]¶

Put videos to multi gpus during testing.

参数

dataset (Dataset) – Test dataset must have data_infos attribute. Each data_info in data_infos records information of one frame or one video (in SOT Dataset). If not SOT Dataset, each video must have one data_info that includes data_info[‘frame_id’] == 0.
num_replicas (int) – The number of gpus. Defaults to None.
rank (int) – Gpu rank id. Defaults to None.
shuffle (bool) – If True, shuffle the dataset. Defaults to False.

class mmtrack.datasets.samplers.SOTVideoSampler(dataset)[源代码]¶

Only used for sot testing on single gpu.

参数: dataset (Dataset) – Test dataset must have num_frames_per_video attribute. It records the frame number of each video.

mmtrack.models¶

mot¶

class mmtrack.models.mot.BaseMultiObjectTracker(init_cfg=None)[源代码]¶

Base class for multiple object tracking.

aug_test(imgs, img_metas, **kwargs)[源代码]¶: Test function with test time augmentation.

forward(img, img_metas, return_loss=True, **kwargs)[源代码]¶

Calls either forward_train() or forward_test() depending on whether return_loss is True.

Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations.

forward_test(imgs, img_metas, **kwargs)[源代码]¶

参数

imgs (List[Tensor]) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains all images in the batch.
img_metas (List[List[dict]]) – the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch.

abstract forward_train(imgs, img_metas, **kwargs)[源代码]¶

参数

img (list[Tensor]) – List of tensors of shape (1, C, H, W). Typically these should be mean centered and std scaled.
img_metas (list[dict]) – List of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys, see mmdet.datasets.pipelines.Collect.
kwargs (keyword arguments) – Specific to concrete implementation.

freeze_module(module)[源代码]¶: Freeze module during training.

show_result(img, result, score_thr=0.0, thickness=1, font_scale=0.5, show=False, out_file=None, wait_time=0, backend='cv2', **kwargs)[源代码]¶

Visualize tracking results.

参数

img (str | ndarray) – Filename of loaded image.
result (dict) – Tracking result. - The value of key ‘track_bboxes’ is list with length num_classes, and each element in list is ndarray with shape(n, 6) in [id, tl_x, tl_y, br_x, br_y, score] format. - The value of key ‘det_bboxes’ is list with length num_classes, and each element in list is ndarray with shape(n, 5) in [tl_x, tl_y, br_x, br_y, score] format.
thickness (int, optional) – Thickness of lines. Defaults to 1.
font_scale (float, optional) – Font scales of texts. Defaults to 0.5.
show (bool, optional) – Whether show the visualizations on the fly. Defaults to False.
out_file (str | None, optional) – Output filename. Defaults to None.
backend (str, optional) – Backend to draw the bounding boxes, options are cv2 and plt. Defaults to ‘cv2’.

返回

Visualized image.

返回类型

ndarray

abstract simple_test(img, img_metas, **kwargs)[源代码]¶: Test function with a single scale.

train_step(data, optimizer)[源代码]¶

The iteration step during training.

This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating is also defined in this method, such as GAN.

参数

data (dict) – The output of dataloader.
optimizer (torch.optim.Optimizer | dict) – The optimizer of runner is passed to train_step(). This argument is unused and reserved.

返回

It should contain at least 3 keys: loss, log_vars, num_samples.

loss is a tensor for back propagation, which can be a

weighted sum of multiple losses. - log_vars contains all the variables to be sent to the logger. - num_samples indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.

返回类型

dict

val_step(data, optimizer)[源代码]¶

The iteration step during validation.

This method shares the same signature as train_step(), but used during val epochs. Note that the evaluation after training epochs is not implemented with this method, but an evaluation hook.

property with_detector¶

whether the framework has a detector.

Type: bool

property with_motion¶

whether the framework has a motion model.

Type: bool

property with_reid¶

whether the framework has a reid model.

Type: bool

property with_track_head¶

whether the framework has a track_head.

Type: bool

property with_tracker¶

whether the framework has a tracker.

Type: bool

class mmtrack.models.mot.ByteTrack(detector=None, tracker=None, motion=None, init_cfg=None)[源代码]¶

ByteTrack: Multi-Object Tracking by Associating Every Detection Box.

This multi object tracker is the implementation of ByteTrack.

参数

detector (dict) – Configuration of detector. Defaults to None.
tracker (dict) – Configuration of tracker. Defaults to None.
motion (dict) – Configuration of motion. Defaults to None.
init_cfg (dict) – Configuration of initialization. Defaults to None.

forward_train(*args, **kwargs)[源代码]¶: Forward function during training.

simple_test(img, img_metas, rescale=False, **kwargs)[源代码]¶

Test without augmentations.

参数

img (Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’.
rescale (bool, optional) – If False, then returned bboxes and masks will fit the scale of img, otherwise, returned bboxes and masks will fit the scale of original image shape. Defaults to False.

返回

list(ndarray)]: The tracking results.

返回类型

dict[str

class mmtrack.models.mot.DeepSORT(detector=None, reid=None, tracker=None, motion=None, pretrains=None, init_cfg=None)[源代码]¶

Simple online and realtime tracking with a deep association metric.

Details can be found at `DeepSORT<https://arxiv.org/abs/1703.07402>`_.

forward_train(*args, **kwargs)[源代码]¶: Forward function during training.

simple_test(img, img_metas, rescale=False, public_bboxes=None, **kwargs)[源代码]¶

Test without augmentations.

参数

img (Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’.
rescale (bool, optional) – If False, then returned bboxes and masks will fit the scale of img, otherwise, returned bboxes and masks will fit the scale of original image shape. Defaults to False.
public_bboxes (list[Tensor], optional) – Public bounding boxes from the benchmark. Defaults to None.

返回

list(ndarray)]: The tracking results.

返回类型

dict[str

class mmtrack.models.mot.OCSORT(detector=None, tracker=None, motion=None, init_cfg=None)[源代码]¶

OCOSRT: Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking

This multi object tracker is the implementation of OC-SORT.

参数

detector (dict) – Configuration of detector. Defaults to None.
tracker (dict) – Configuration of tracker. Defaults to None.
motion (dict) – Configuration of motion. Defaults to None.
init_cfg (dict) – Configuration of initialization. Defaults to None.

forward_train(*args, **kwargs)[源代码]¶: Forward function during training.

simple_test(img, img_metas, rescale=False, **kwargs)[源代码]¶

Test without augmentations.

参数

img (Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’.
rescale (bool, optional) – If False, then returned bboxes and masks will fit the scale of img, otherwise, returned bboxes and masks will fit the scale of original image shape. Defaults to False.

返回

list(ndarray)]: The tracking results.

返回类型

dict[str

class mmtrack.models.mot.QDTrack(detector=None, track_head=None, tracker=None, freeze_detector=False, *args, **kwargs)[源代码]¶

Quasi-Dense Similarity Learning for Multiple Object Tracking.

This multi object tracker is the implementation of QDTrack.

参数

detector (dict) – Configuration of detector. Defaults to None.
track_head (dict) – Configuration of track head. Defaults to None.
tracker (dict) – Configuration of tracker. Defaults to None.
freeze_detector (bool) – If True, freeze the detector weights. Defaults to False.

forward_train(img, img_metas, gt_bboxes, gt_labels, gt_match_indices, ref_img, ref_img_metas, ref_gt_bboxes, ref_gt_labels, gt_bboxes_ignore=None, gt_masks=None, ref_gt_bboxes_ignore=None, ref_gt_masks=None, **kwargs)[源代码]¶

Forward function during training.

Args:

img (Tensor): of shape (N, C, H, W) encoding input images.
Typically these should be mean centered and std scaled.

img_metas (list[dict]): list of image info dict where each dict
has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’.

gt_bboxes (list[Tensor]): Ground truth bboxes of the image,
each item has a shape (num_gts, 4).

gt_labels (list[Tensor]): Ground truth labels of all images.
each has a shape (num_gts,).

gt_match_indices (list(Tensor)): Mapping from gt_instance_ids to
ref_gt_instance_ids of the same tracklet in a pair of images.

ref_img (Tensor): of shape (N, C, H, W) encoding input reference
images. Typically these should be mean centered and std scaled.

ref_img_metas (list[dict]): list of reference image info dict where
each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’.

ref_gt_bboxes (list[Tensor]): Ground truth bboxes of the
reference image, each item has a shape (num_gts, 4).

ref_gt_labels (list[Tensor]): Ground truth labels of all
reference images, each has a shape (num_gts,).

gt_masks (list[Tensor])Masks for each bbox, has a shape
(num_gts, h , w).

gt_bboxes_ignore (list[Tensor], None): Ground truth bboxes to be
ignored, each item has a shape (num_ignored_gts, 4).

ref_gt_bboxes_ignore (list[Tensor], None): Ground truth bboxes
of reference images to be ignored, each item has a shape (num_ignored_gts, 4).

ref_gt_masks (list[Tensor])Masks for each reference bbox,
has a shape (num_gts, h , w).

返回: Tensor]: All losses.
返回类型: dict[str

simple_test(img, img_metas, rescale=False)[源代码]¶

Test forward.

Args:

img (Tensor): of shape (N, C, H, W) encoding input images.
Typically these should be mean centered and std scaled.

img_metas (list[dict]): list of image info dict where each dict
has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’.

rescale (bool): whether to rescale the bboxes.

返回: Tensor]: Track results.
返回类型: dict[str

class mmtrack.models.mot.Tracktor(detector=None, reid=None, tracker=None, motion=None, pretrains=None, init_cfg=None)[源代码]¶

Tracking without bells and whistles.

Details can be found at `Tracktor<https://arxiv.org/abs/1903.05625>`_.

forward_train(*args, **kwargs)[源代码]¶: Forward function during training.

simple_test(img, img_metas, rescale=False, public_bboxes=None, **kwargs)[源代码]¶

Test without augmentations.

参数

img (Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’.
rescale (bool, optional) – If False, then returned bboxes and masks will fit the scale of img, otherwise, returned bboxes and masks will fit the scale of original image shape. Defaults to False.
public_bboxes (list[Tensor], optional) – Public bounding boxes from the benchmark. Defaults to None.

返回

list(ndarray)]: The tracking results.

返回类型

dict[str

property with_cmc¶

whether the framework has a camera model compensation model.

Type: bool

property with_linear_motion¶

whether the framework has a linear motion model.

Type: bool

sot¶

class mmtrack.models.sot.SiamRPN(backbone, neck=None, head=None, pretrains=None, init_cfg=None, frozen_modules=None, train_cfg=None, test_cfg=None)[源代码]¶

SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks.

This single object tracker is the implementation of SiamRPN++.

forward_search(x_img)[源代码]¶

Extract the features of search images.

参数: x_img (Tensor) – of shape (N, C, H, W) encoding input search images. Typically H and W equal to 255.
返回: Multi level feature map of search images.
返回类型: tuple(Tensor)

forward_template(z_img)[源代码]¶

Extract the features of exemplar images.

参数: z_img (Tensor) – of shape (N, C, H, W) encoding input exemplar images. Typically H and W equal to 127.
返回: Multi level feature map of exemplar images.
返回类型: tuple(Tensor)

forward_train(img, img_metas, gt_bboxes, search_img, search_img_metas, search_gt_bboxes, is_positive_pairs, **kwargs)[源代码]¶

参数

img (Tensor) – of shape (N, C, H, W) encoding input exemplar images. Typically H and W equal to 127.
img_metas (list[dict]) – list of image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
gt_bboxes (list[Tensor]) – Ground truth bboxes for each exemplar image with shape (1, 4) in [tl_x, tl_y, br_x, br_y] format.
search_img (Tensor) – of shape (N, 1, C, H, W) encoding input search images. 1 denotes there is only one search image for each exemplar image. Typically H and W equal to 255.
search_img_metas (list[list[dict]]) – The second list only has one element. The first list contains search image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
search_gt_bboxes (list[Tensor]) – Ground truth bboxes for each search image with shape (1, 5) in [0.0, tl_x, tl_y, br_x, br_y] format.
is_positive_pairs (list[bool]) – list of bool denoting whether each exemplar image and corresponding search image is positive pair.

返回

a dictionary of loss components.

返回类型

dict[str, Tensor]

get_cropped_img(img, center_xy, target_size, crop_size, avg_channel)[源代码]¶

Crop image.

Only used during testing.

This function mainly contains two steps: 1. Crop img based on center center_xy and size crop_size. If the cropped image is out of boundary of img, use avg_channel to pad. 2. Resize the cropped image to target_size.

参数

img (Tensor) – of shape (1, C, H, W) encoding original input image.
center_xy (Tensor) – of shape (2, ) denoting the center point for cropping image.
target_size (int) – The output size of cropped image.
crop_size (Tensor) – The size for cropping image.
avg_channel (Tensor) – of shape (3, ) denoting the padding values.

返回

of shape (1, C, target_size, target_size) encoding the resized cropped image.

返回类型

Tensor

init(img, bbox)[源代码]¶

Initialize the single object tracker in the first frame.

参数

img (Tensor) – of shape (1, C, H, W) encoding original input image.
bbox (Tensor) – The given instance bbox of first frame that need be tracked in the following frames. The shape of the box is (4, ) with [cx, cy, w, h] format.

返回

z_feat is a tuple[Tensor] that contains the multi level feature maps of exemplar image, avg_channel is Tensor with shape (3, ), and denotes the padding values.

返回类型

tuple(z_feat, avg_channel)

init_weights()[源代码]¶: Initialize the weights of modules in single object tracker.

simple_test(img, img_metas, gt_bboxes, **kwargs)[源代码]¶

Test without augmentation.

参数

img (Tensor) – of shape (1, C, H, W) encoding input image.
img_metas (list[dict]) – list of image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
gt_bboxes (list[Tensor]) – list of ground truth bboxes for each image with shape (1, 4) in [tl_x, tl_y, br_x, br_y] format or shape (1, 8) in [x1, y1, x2, y2, x3, y3, x4, y4].

返回

ndarray]: The tracking results.

返回类型

dict[str

simple_test_ope(img, frame_id, gt_bboxes)[源代码]¶

Test using OPE test mode.

参数

img (Tensor) – of shape (1, C, H, W) encoding input image.
frame_id (int) – the id of current frame in the video.
gt_bboxes (list[Tensor]) – list of ground truth bboxes for each image with shape (1, 4) in [tl_x, tl_y, br_x, br_y] format or shape (1, 8) in [x1, y1, x2, y2, x3, y3, x4, y4].

返回

in [tl_x, tl_y, br_x, br_y] format. best_score (Tensor): the tracking bbox confidence in range [0,1],

and the score of initial frame is -1.

返回类型

bbox_pred (Tensor)

simple_test_vot(img, frame_id, gt_bboxes, img_metas=None)[源代码]¶

Test using VOT test mode.

参数

img (Tensor) – of shape (1, C, H, W) encoding input image.
frame_id (int) – the id of current frame in the video.
gt_bboxes (list[Tensor]) – list of ground truth bboxes for each image with shape (1, 4) in [tl_x, tl_y, br_x, br_y] format or shape (1, 8) in [x1, y1, x2, y2, x3, y3, x4, y4].
img_metas (list[dict]) – list of image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.

返回

in [tl_x, tl_y, br_x, br_y] format. best_score (Tensor): the tracking bbox confidence in range [0,1],

and the score of initial frame is -1.

返回类型

bbox_pred (Tensor)

track(img, bbox, z_feat, avg_channel)[源代码]¶

Track the box bbox of previous frame to current frame img.

参数

img (Tensor) – of shape (1, C, H, W) encoding original input image.
bbox (Tensor) – The bbox in previous frame. The shape of the box is (4, ) in [cx, cy, w, h] format.
z_feat (tuple[Tensor]) – The multi level feature maps of exemplar image in the first frame.
avg_channel (Tensor) – of shape (3, ) denoting the padding values.

返回

best_score is a Tensor denoting the score of best_bbox, best_bbox is a Tensor of shape (4, ) in [cx, cy, w, h] format, and denotes the best tracked bbox in current frame.

返回类型

tuple(best_score, best_bbox)

class mmtrack.models.sot.Stark(backbone, neck=None, head=None, init_cfg=None, frozen_modules=None, train_cfg=None, test_cfg=None)[源代码]¶

STARK: Learning Spatio-Temporal Transformer for Visual Tracking.

This single object tracker is the implementation of STARk.

参数

backbone (dict) – the configuration of backbone network.
neck (dict, optional) – the configuration of neck network. Defaults to None.
head (dict, optional) – the configuration of head network. Defaults to None.
init_cfg (dict, optional) – the configuration of initialization. Defaults to None.
frozen_modules (str | list | tuple, optional) – the names of frozen modules. Defaults to None.
train_cfg (dict, optional) – the configuratioin of train. Defaults to None.
test_cfg (dict, optional) – the configuration of test. Defaults to None.

extract_feat(img)[源代码]¶

Extract the features of the input image.

参数

img (Tensor) – image of shape (N, C, H, W).

返回

the multi-level feature maps, and each of them is: of shape (N, C, H // stride, W // stride).

返回类型

tuple(Tensor)

forward_train(img, img_metas, search_img, search_img_metas, gt_bboxes, padding_mask, search_gt_bboxes, search_padding_mask, search_gt_labels=None, **kwargs)[源代码]¶

forward of training.

参数

img (Tensor) – template images of shape (N, num_templates, C, H, W). Typically, there are 2 template images, and H and W are both equal to 128.
img_metas (list[dict]) – list of image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
search_img (Tensor) – of shape (N, 1, C, H, W) encoding input search images. 1 denotes there is only one search image for each template image. Typically H and W are both equal to 320.
search_img_metas (list[list[dict]]) – The second list only has one element. The first list contains search image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
gt_bboxes (list[Tensor]) – Ground truth bboxes for template images with shape (N, 4) in [tl_x, tl_y, br_x, br_y] format.
padding_mask (Tensor) – padding mask of template images. It’s of shape (N, num_templates, H, W). Typically, there are 2 padding masks of template images, and H and W are both equal to that of template images.
search_gt_bboxes (list[Tensor]) – Ground truth bboxes for search images with shape (N, 5) in [0., tl_x, tl_y, br_x, br_y] format.
search_padding_mask (Tensor) – padding mask of search images. Its of shape (N, 1, H, W). There are 1 padding masks of search image, and H and W are both equal to that of search image.
search_gt_labels (list[Tensor], optional) – Ground truth labels for search images with shape (N, 2).

返回

a dictionary of loss components.

返回类型

dict[str, Tensor]

get_cropped_img(img, target_bbox, search_area_factor, output_size)[源代码]¶

Crop Image Only used during testing This function mainly contains two steps: 1. Crop img based on target_bbox and search_area_factor. If the cropped image/mask is out of boundary of img, use 0 to pad. 2. Resize the cropped image/mask to output_size.

参数

img (Tensor) – of shape (1, C, H, W)
target_bbox (list | ndarray) – in [cx, cy, w, h] format
search_area_factor (float) – Ratio of crop size to target size
output_size (float) – the size of output cropped image (always square).

返回

of shape (1, C, output_size, output_size) resize_factor (float): the ratio of original image scale to cropped

image scale.

pdding_mask (Tensor): the padding mask caused by cropping. It’s: of shape (1, output_size, output_size).

返回类型

img_crop_padded (Tensor)

init(img, bbox)[源代码]¶

Initialize the single object tracker in the first frame.

参数

img (Tensor) – input image of shape (1, C, H, W).
bbox (list | Tensor) – in [cx, cy, w, h] format.

init_weights()[源代码]¶: Initialize the weights of modules in single object tracker.

mapping_bbox_back(pred_bboxes, prev_bbox, resize_factor)[源代码]¶

Mapping the prediction bboxes from resized cropped image to original image. The coordinate origins of them are both the top left corner.

参数

pred_bboxes (Tensor) – the predicted bbox of shape (B, Nq, 4), in [tl_x, tl_y, br_x, br_y] format. The coordinates are based in the resized cropped image.
prev_bbox (Tensor) – the previous bbox of shape (B, 4), in [cx, cy, w, h] format. The coordinates are based in the original image.
resize_factor (float) – the ratio of original image scale to cropped image scale.

返回

in [tl_x, tl_y, br_x, br_y] format.

返回类型

(Tensor)

simple_test(img, img_metas, gt_bboxes, **kwargs)[源代码]¶

Test without augmentation.

参数

img (Tensor) – input image of shape (1, C, H, W).
img_metas (list[dict]) – list of image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
gt_bboxes (list[Tensor]) – list of ground truth bboxes for each image with shape (1, 4) in [tl_x, tl_y, br_x, br_y] format.

返回

ndarray): the tracking results.

返回类型

dict(str

track(img, bbox)[源代码]¶

Track the box bbox of previous frame to current frame img.

参数

img (Tensor) – of shape (1, C, H, W).
bbox (list | Tensor) – The bbox in previous frame. The shape of the bbox is (4, ) in [x, y, w, h] format.

Returns:

update_template(img, bbox, conf_score)[源代码]¶

Update the dymanic templates.

参数

img (Tensor) – of shape (1, C, H, W).
bbox (list | ndarray) – in [cx, cy, w, h] format.
conf_score (float) – the confidence score of the predicted bbox.

vid¶

class mmtrack.models.vid.BaseVideoDetector(init_cfg)[源代码]¶

Base class for video object detector.

参数: init_cfg (dict or list[dict], optional) – Initialization config dict.

aug_test(imgs, img_metas, **kwargs)[源代码]¶: Test function with test time augmentation.

forward(img, img_metas, ref_img=None, ref_img_metas=None, return_loss=True, **kwargs)[源代码]¶

Calls either forward_train() or forward_test() depending on whether return_loss is True.

Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations.

forward_test(imgs, img_metas, ref_img=None, ref_img_metas=None, **kwargs)[源代码]¶

参数

imgs (List[Tensor]) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains all images in the batch.
img_metas (List[List[dict]]) – the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch.
ref_img (list[Tensor] | None) – The list only contains one Tensor of shape (1, N, C, H, W) encoding input reference images. Typically these should be mean centered and std scaled. N denotes the number for reference images. There may be no reference images in some cases.
ref_img_metas (list[list[list[dict]]] | None) – The first and second list only has one element. The third list contains image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect. There may be no reference images in some cases.

abstract forward_train(imgs, img_metas, ref_img=None, ref_img_metas=None, **kwargs)[源代码]¶

参数

img (Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
ref_img (Tensor) – of shape (N, R, C, H, W) encoding input images. Typically these should be mean centered and std scaled. R denotes there is #R reference images for each input image.
ref_img_metas (list[list[dict]]) – The first list only has one element. The second list contains reference image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.

freeze_module(module)[源代码]¶: Freeze module during training.

show_result(img, result, score_thr=0.3, bbox_color='green', text_color='green', thickness=1, font_scale=0.5, win_name='', show=False, wait_time=0, out_file=None)[源代码]¶

Draw result over img.

参数

img (str or Tensor) – The image to be displayed.
result (dict) – The results to draw over img det_bboxes or (det_bboxes, det_masks). The value of key ‘det_bboxes’ is list with length num_classes, and each element in list is ndarray with shape(n, 5) in [tl_x, tl_y, br_x, br_y, score] format.
score_thr (float, optional) – Minimum score of bboxes to be shown. Default: 0.3.
bbox_color (str or tuple or Color) – Color of bbox lines.
text_color (str or tuple or Color) – Color of texts.
thickness (int) – Thickness of lines.
font_scale (float) – Font scales of texts.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param. Default: 0.
show (bool) – Whether to show the image. Default: False.
out_file (str or None) – The filename to write the image. Default: None.

返回

Only if not show or out_file

返回类型

img (Tensor)

train_step(data, optimizer)[源代码]¶

The iteration step during training.

This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating is also defined in this method, such as GAN.

参数

data (dict) – The output of dataloader.
optimizer (torch.optim.Optimizer | dict) – The optimizer of runner is passed to train_step(). This argument is unused and reserved.

返回

It should contain at least 3 keys: loss, log_vars, num_samples.

loss is a tensor for back propagation, which can be a weighted sum of multiple losses.

log_vars contains all the variables to be sent to the

logger. - num_samples indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.

返回类型

dict

val_step(data, optimizer)[源代码]¶

The iteration step during validation.

This method shares the same signature as train_step(), but used during val epochs. Note that the evaluation after training epochs is not implemented with this method, but an evaluation hook.

property with_aggregator¶

whether the framework has a aggregator

Type: bool

property with_detector¶

whether the framework has a detector

Type: bool

property with_motion¶

whether the framework has a motion model

Type: bool

class mmtrack.models.vid.DFF(detector, motion, pretrains=None, init_cfg=None, frozen_modules=None, train_cfg=None, test_cfg=None)[源代码]¶

Deep Feature Flow for Video Recognition.

This video object detector is the implementation of DFF.

aug_test(imgs, img_metas, **kwargs)[源代码]¶: Test function with test time augmentation.

extract_feats(img, img_metas)[源代码]¶

Extract features for img during testing.

参数

img (Tensor) – of shape (1, C, H, W) encoding input image. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.

返回

Multi level feature maps of img.

返回类型

list[Tensor]

forward_train(img, img_metas, gt_bboxes, gt_labels, ref_img, ref_img_metas, ref_gt_bboxes, ref_gt_labels, gt_instance_ids=None, gt_bboxes_ignore=None, gt_masks=None, proposals=None, ref_gt_instance_ids=None, ref_gt_bboxes_ignore=None, ref_gt_masks=None, ref_proposals=None, **kwargs)[源代码]¶

参数

img (Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box.
ref_img (Tensor) – of shape (N, 1, C, H, W) encoding input images. Typically these should be mean centered and std scaled. 1 denotes there is only one reference image for each input image.
ref_img_metas (list[list[dict]]) – The first list only has one element. The second list contains reference image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
ref_gt_bboxes (list[Tensor]) – The list only has one Tensor. The Tensor contains ground truth bboxes for each reference image with shape (num_all_ref_gts, 5) in [ref_img_id, tl_x, tl_y, br_x, br_y] format. The ref_img_id start from 0, and denotes the id of reference image for each key image.
ref_gt_labels (list[Tensor]) – The list only has one Tensor. The Tensor contains class indices corresponding to each reference box with shape (num_all_ref_gts, 2) in [ref_img_id, class_indice].
gt_instance_ids (None | list[Tensor]) – specify the instance id for each ground truth bbox.
gt_bboxes_ignore (None | list[Tensor]) – specify which bounding boxes can be ignored when computing the loss.
gt_masks (None | Tensor) – true segmentation masks for each box used if the architecture supports a segmentation task.
proposals (None | Tensor) – override rpn proposals with custom proposals. Use when with_rpn is False.
ref_gt_instance_ids (None | list[Tensor]) – specify the instance id for each ground truth bboxes of reference images.
ref_gt_bboxes_ignore (None | list[Tensor]) – specify which bounding boxes of reference images can be ignored when computing the loss.
ref_gt_masks (None | Tensor) – True segmentation masks for each box of reference image used if the architecture supports a segmentation task.
ref_proposals (None | Tensor) – override rpn proposals with custom proposals of reference images. Use when with_rpn is False.

返回

a dictionary of loss components

返回类型

dict[str, Tensor]

simple_test(img, img_metas, ref_img=None, ref_img_metas=None, proposals=None, rescale=False)[源代码]¶

Test without augmentation.

参数

img (Tensor) – of shape (1, C, H, W) encoding input image. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
ref_img (None) – Not used in DFF. Only for unifying API interface.
ref_img_metas (None) – Not used in DFF. Only for unifying API interface.
proposals (None | Tensor) – Override rpn proposals with custom proposals. Use when with_rpn is False. Defaults to None.
rescale (bool) – If False, then returned bboxes and masks will fit the scale of img, otherwise, returned bboxes and masks will fit the scale of original image shape. Defaults to False.

返回

list(ndarray)]: The detection results.

返回类型

dict[str

class mmtrack.models.vid.FGFA(detector, motion, aggregator, pretrains=None, init_cfg=None, frozen_modules=None, train_cfg=None, test_cfg=None)[源代码]¶

Flow-Guided Feature Aggregation for Video Object Detection.

This video object detector is the implementation of FGFA.

aug_test(imgs, img_metas, **kwargs)[源代码]¶: Test function with test time augmentation.

extract_feats(img, img_metas, ref_img, ref_img_metas)[源代码]¶

Extract features for img during testing.

参数

img (Tensor) – of shape (1, C, H, W) encoding input image. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
ref_img (Tensor | None) – of shape (1, N, C, H, W) encoding input reference images. Typically these should be mean centered and std scaled. N denotes the number of reference images. There may be no reference images in some cases.
ref_img_metas (list[list[dict]] | None) – The first list only has one element. The second list contains image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect. There may be no reference images in some cases.

返回

Multi level feature maps of img.

返回类型

list[Tensor]

forward_train(img, img_metas, gt_bboxes, gt_labels, ref_img, ref_img_metas, ref_gt_bboxes, ref_gt_labels, gt_instance_ids=None, gt_bboxes_ignore=None, gt_masks=None, proposals=None, ref_gt_instance_ids=None, ref_gt_bboxes_ignore=None, ref_gt_masks=None, ref_proposals=None, **kwargs)[源代码]¶

参数

img (Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box.
ref_img (Tensor) – of shape (N, 2, C, H, W) encoding input images. Typically these should be mean centered and std scaled. 2 denotes there is two reference images for each input image.
ref_img_metas (list[list[dict]]) – The first list only has one element. The second list contains reference image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
ref_gt_bboxes (list[Tensor]) – The list only has one Tensor. The Tensor contains ground truth bboxes for each reference image with shape (num_all_ref_gts, 5) in [ref_img_id, tl_x, tl_y, br_x, br_y] format. The ref_img_id start from 0, and denotes the id of reference image for each key image.
ref_gt_labels (list[Tensor]) – The list only has one Tensor. The Tensor contains class indices corresponding to each reference box with shape (num_all_ref_gts, 2) in [ref_img_id, class_indice].
gt_instance_ids (None | list[Tensor]) – specify the instance id for each ground truth bbox.
gt_bboxes_ignore (None | list[Tensor]) – specify which bounding boxes can be ignored when computing the loss.
gt_masks (None | Tensor) – true segmentation masks for each box used if the architecture supports a segmentation task.
proposals (None | Tensor) – override rpn proposals with custom proposals. Use when with_rpn is False.
ref_gt_instance_ids (None | list[Tensor]) – specify the instance id for each ground truth bboxes of reference images.
ref_gt_bboxes_ignore (None | list[Tensor]) – specify which bounding boxes of reference images can be ignored when computing the loss.
ref_gt_masks (None | Tensor) – True segmentation masks for each box of reference image used if the architecture supports a segmentation task.
ref_proposals (None | Tensor) – override rpn proposals with custom proposals of reference images. Use when with_rpn is False.

返回

a dictionary of loss components

返回类型

dict[str, Tensor]

simple_test(img, img_metas, ref_img=None, ref_img_metas=None, proposals=None, rescale=False)[源代码]¶

Test without augmentation.

参数

img (Tensor) – of shape (1, C, H, W) encoding input image. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
ref_img (list[Tensor] | None) – The list only contains one Tensor of shape (1, N, C, H, W) encoding input reference images. Typically these should be mean centered and std scaled. N denotes the number for reference images. There may be no reference images in some cases.
ref_img_metas (list[list[list[dict]]] | None) – The first and second list only has one element. The third list contains image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect. There may be no reference images in some cases.
proposals (None | Tensor) – Override rpn proposals with custom proposals. Use when with_rpn is False. Defaults to None.
rescale (bool) – If False, then returned bboxes and masks will fit the scale of img, otherwise, returned bboxes and masks will fit the scale of original image shape. Defaults to False.

返回

list(ndarray)]: The detection results.

返回类型

dict[str

class mmtrack.models.vid.SELSA(detector, pretrains=None, init_cfg=None, frozen_modules=None, train_cfg=None, test_cfg=None)[源代码]¶

Sequence Level Semantics Aggregation for Video Object Detection.

This video object detector is the implementation of SELSA.

aug_test(imgs, img_metas, **kwargs)[源代码]¶: Test function with test time augmentation.

extract_feats(img, img_metas, ref_img, ref_img_metas)[源代码]¶

Extract features for img during testing.

参数

img (Tensor) – of shape (1, C, H, W) encoding input image. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
ref_img (Tensor | None) – of shape (1, N, C, H, W) encoding input reference images. Typically these should be mean centered and std scaled. N denotes the number of reference images. There may be no reference images in some cases.
ref_img_metas (list[list[dict]] | None) – The first list only has one element. The second list contains image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect. There may be no reference images in some cases.

返回

x is the multi level: feature maps of img, ref_x is the multi level feature maps of ref_img.

返回类型

tuple(x, img_metas, ref_x, ref_img_metas)

forward_train(img, img_metas, gt_bboxes, gt_labels, ref_img, ref_img_metas, ref_gt_bboxes, ref_gt_labels, gt_instance_ids=None, gt_bboxes_ignore=None, gt_masks=None, proposals=None, ref_gt_instance_ids=None, ref_gt_bboxes_ignore=None, ref_gt_masks=None, ref_proposals=None, **kwargs)[源代码]¶

参数

img (Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box.
ref_img (Tensor) – of shape (N, 2, C, H, W) encoding input images. Typically these should be mean centered and std scaled. 2 denotes there is two reference images for each input image.
ref_img_metas (list[list[dict]]) – The first list only has one element. The second list contains reference image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
ref_gt_bboxes (list[Tensor]) – The list only has one Tensor. The Tensor contains ground truth bboxes for each reference image with shape (num_all_ref_gts, 5) in [ref_img_id, tl_x, tl_y, br_x, br_y] format. The ref_img_id start from 0, and denotes the id of reference image for each key image.
ref_gt_labels (list[Tensor]) – The list only has one Tensor. The Tensor contains class indices corresponding to each reference box with shape (num_all_ref_gts, 2) in [ref_img_id, class_indice].
gt_instance_ids (None | list[Tensor]) – specify the instance id for each ground truth bbox.
gt_bboxes_ignore (None | list[Tensor]) – specify which bounding boxes can be ignored when computing the loss.
gt_masks (None | Tensor) – true segmentation masks for each box used if the architecture supports a segmentation task.
proposals (None | Tensor) – override rpn proposals with custom proposals. Use when with_rpn is False.
ref_gt_instance_ids (None | list[Tensor]) – specify the instance id for each ground truth bboxes of reference images.
ref_gt_bboxes_ignore (None | list[Tensor]) – specify which bounding boxes of reference images can be ignored when computing the loss.
ref_gt_masks (None | Tensor) – True segmentation masks for each box of reference image used if the architecture supports a segmentation task.
ref_proposals (None | Tensor) – override rpn proposals with custom proposals of reference images. Use when with_rpn is False.

返回

a dictionary of loss components

返回类型

dict[str, Tensor]

simple_test(img, img_metas, ref_img=None, ref_img_metas=None, proposals=None, ref_proposals=None, rescale=False)[源代码]¶

Test without augmentation.

参数

img (Tensor) – of shape (1, C, H, W) encoding input image. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
ref_img (list[Tensor] | None) – The list only contains one Tensor of shape (1, N, C, H, W) encoding input reference images. Typically these should be mean centered and std scaled. N denotes the number for reference images. There may be no reference images in some cases.
ref_img_metas (list[list[list[dict]]] | None) – The first and second list only has one element. The third list contains image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect. There may be no reference images in some cases.
proposals (None | Tensor) – Override rpn proposals with custom proposals. Use when with_rpn is False. Defaults to None.
rescale (bool) – If False, then returned bboxes and masks will fit the scale of img, otherwise, returned bboxes and masks will fit the scale of original image shape. Defaults to False.

返回

list(ndarray)]: The detection results.

返回类型

dict[str

aggregators¶

class mmtrack.models.aggregators.EmbedAggregator(num_convs=1, channels=256, kernel_size=3, norm_cfg=None, act_cfg={'type': 'ReLU'}, init_cfg=None)[源代码]¶

Embedding convs to aggregate multi feature maps.

This module is proposed in “Flow-Guided Feature Aggregation for Video Object Detection”. FGFA.

参数

num_convs (int) – Number of embedding convs.
channels (int) – Channels of embedding convs. Defaults to 256.
kernel_size (int) – Kernel size of embedding convs, Defaults to 3.
norm_cfg (dict) – Configuration of normlization method after each conv. Defaults to None.
act_cfg (dict) – Configuration of activation method after each conv. Defaults to dict(type=’ReLU’).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x, ref_x)[源代码]¶

Aggregate reference feature maps ref_x.

The aggregation mainly contains two steps: 1. Computing the cos similarity between x and ref_x. 2. Use the normlized (i.e. softmax) cos similarity to weightedly sum ref_x.

参数

x (Tensor) – of shape [1, C, H, W]
ref_x (Tensor) – of shape [N, C, H, W]. N is the number of reference feature maps.

返回

The aggregated feature map with shape [1, C, H, W].

返回类型

Tensor

class mmtrack.models.aggregators.SelsaAggregator(in_channels, num_attention_blocks=16, init_cfg=None)[源代码]¶

Selsa aggregator module.

This module is proposed in “Sequence Level Semantics Aggregation for Video Object Detection”. SELSA.

参数

in_channels (int) – The number of channels of the features of proposal.
num_attention_blocks (int) – The number of attention blocks used in selsa aggregator module. Defaults to 16.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x, ref_x)[源代码]¶

Aggregate the features ref_x of reference proposals.

The aggregation mainly contains two steps: 1. Use multi-head attention to computing the weight between x and ref_x. 2. Use the normlized (i.e. softmax) weight to weightedly sum ref_x.

参数

x (Tensor) – of shape [N, C]. N is the number of key frame proposals.
ref_x (Tensor) – of shape [M, C]. M is the number of reference frame proposals.

返回

The aggregated features of key frame proposals with shape [N, C].

返回类型

Tensor

backbones¶

class mmtrack.models.backbones.SOTResNet(depth, unfreeze_backbone=True, **kwargs)[源代码]¶

ResNet backbone for SOT.

The main difference between ResNet in torch and the SOTResNet is the padding and dilation in the convs of SOTResNet. Please refer to SiamRPN++ for detailed analysis.

参数: depth (int) – Depth of resnet, from {50, }.

make_res_layer(**kwargs)[源代码]¶: Pack all blocks in a stage into a ResLayer.

losses¶

class mmtrack.models.losses.L2Loss(neg_pos_ub=- 1, pos_margin=- 1, neg_margin=- 1, hard_mining=False, reduction='mean', loss_weight=1.0)[源代码]¶

L2 loss.

参数

reduction (str, optional) – The method to reduce the loss. Options are “none”, “mean” and “sum”.
loss_weight (float, optional) – The weight of loss.

forward(pred, target, weight=None, avg_factor=None, reduction_override=None)[源代码]¶

Forward function.

参数

pred (torch.Tensor) – The prediction.
target (torch.Tensor) – The learning target of the prediction.
weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.

static random_choice(gallery, num)[源代码]¶

Random select some elements from the gallery.

It seems that Pytorch’s implementation is slower than numpy so we use numpy to randperm the indices.

update_weight(pred, target, weight, avg_factor)[源代码]¶: Update the weight according to targets.

class mmtrack.models.losses.MultiPosCrossEntropyLoss(reduction='mean', loss_weight=1.0)[源代码]¶

multi-positive targets cross entropy loss.

forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, **kwargs)[源代码]¶

Forward function.

参数

cls_score (torch.Tensor) – The classification score.
label (torch.Tensor) – The assigned label of the prediction.
weight (torch.Tensor) – The element-wise weight.
avg_factor (float) – Average factor when computing the mean of losses.
reduction (str) – Same as built-in losses of PyTorch.

返回

Calculated loss

返回类型

torch.Tensor

multi_pos_cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None)[源代码]¶

参数

pred (torch.Tensor) – The prediction.
label (torch.Tensor) – The assigned label of the prediction.
weight (torch.Tensor) – The element-wise weight.
reduction (str) – Same as built-in losses of PyTorch.
avg_factor (float) – Average factor when computing the mean of losses.

返回

Calculated loss

返回类型

torch.Tensor

class mmtrack.models.losses.TripletLoss(margin=0.3, loss_weight=1.0, hard_mining=True)[源代码]¶

Triplet loss with hard positive/negative mining.

Reference:

Hermans et al. In Defense of the Triplet Loss for: Person Re-Identification. arXiv:1703.07737.

Imported from `<https://github.com/KaiyangZhou/deep-person-reid/blob/

master/torchreid/losses/hard_mine_triplet_loss.py>`_.

参数

margin (float, optional) – Margin for triplet loss. Default to 0.3.
loss_weight (float, optional) – Weight of the loss. Default to 1.0.

forward(inputs, targets, **kwargs)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

hard_mining_triplet_loss_forward(inputs, targets)[源代码]¶

参数

inputs (torch.Tensor) – feature matrix with shape (batch_size, feat_dim).
targets (torch.LongTensor) – ground truth labels with shape (num_classes).

motion¶

class mmtrack.models.motion.CameraMotionCompensation(warp_mode='cv2.MOTION_EUCLIDEAN', num_iters=50, stop_eps=0.001)[源代码]¶

Camera motion compensation.

参数

warp_mode (str) – Warp mode in opencv.
num_iters (int) – Number of the iterations.
stop_eps (float) – Terminate threshold.

get_warp_matrix(img, ref_img)[源代码]¶: Calculate warping matrix between two images.

track(img, ref_img, tracks, num_samples, frame_id)[源代码]¶: Tracking forward.

warp_bboxes(bboxes, warp_matrix)[源代码]¶: Warp bounding boxes according to the warping matrix.

class mmtrack.models.motion.FlowNetSimple(img_scale_factor, out_indices=[2, 3, 4, 5, 6], flow_scale_factor=5.0, flow_img_norm_std=[255.0, 255.0, 255.0], flow_img_norm_mean=[0.411, 0.432, 0.45], init_cfg=None)[源代码]¶

The simple version of FlowNet.

This FlowNetSimple is the implementation of FlowNetSimple.

参数

img_scale_factor (float) – Used to upsample/downsample the image.
out_indices (list) – The indices of outputting feature maps after each group of conv layers. Defaults to [2, 3, 4, 5, 6].
flow_scale_factor (float) – Used to enlarge the values of flow. Defaults to 5.0.
flow_img_norm_std (list) – Used to scale the values of image. Defaults to [255.0, 255.0, 255.0].
flow_img_norm_mean (list) – Used to center the values of image. Defaults to [0.411, 0.432, 0.450].
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

crop_like(input, target)[源代码]¶: Crop input as the size of target.

forward(imgs, img_metas)[源代码]¶

Compute the flow of images pairs.

参数

imgs (Tensor) – of shape (N, 6, H, W) encoding input images pairs. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.

返回

of shape (N, 2, H, W) encoding flow of images pairs.

返回类型

Tensor

prepare_imgs(imgs, img_metas)[源代码]¶

Preprocess images pairs for computing flow.

参数

imgs (Tensor) – of shape (N, 6, H, W) encoding input images pairs. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image information dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.

返回

of shape (N, 6, H, W) encoding the input images pairs for FlowNetSimple.

返回类型

Tensor

class mmtrack.models.motion.KalmanFilter(center_only=False)[源代码]¶

A simple Kalman filter for tracking bounding boxes in image space.

The implementation is referred to https://github.com/nwojke/deep_sort.

gating_distance(mean, covariance, measurements, only_position=False)[源代码]¶

Compute gating distance between state distribution and measurements.

A suitable distance threshold can be obtained from chi2inv95. If only_position is False, the chi-square distribution has 4 degrees of freedom, otherwise 2.

参数

mean (ndarray) – Mean vector over the state distribution (8 dimensional).
covariance (ndarray) – Covariance of the state distribution (8x8 dimensional).
measurements (ndarray) – An Nx4 dimensional matrix of N measurements, each in format (x, y, a, h) where (x, y) is the bounding box center position, a the aspect ratio, and h the height.
only_position (bool, optional) – If True, distance computation is done with respect to the bounding box center position only. Defaults to False.

返回

Returns an array of length N, where the i-th element contains the squared Mahalanobis distance between (mean, covariance) and measurements[i].

返回类型

ndarray

initiate(measurement)[源代码]¶

Create track from unassociated measurement.

参数

measurement (ndarray) – Bounding box coordinates (x, y, a, h) with
position (center) –

返回

Returns the mean vector (8 dimensional) and: covariance matrix (8x8 dimensional) of the new track. Unobserved velocities are initialized to 0 mean.

返回类型

(ndarray, ndarray)

predict(mean, covariance)[源代码]¶

Run Kalman filter prediction step.

参数

mean (ndarray) – The 8 dimensional mean vector of the object state at the previous time step.
covariance (ndarray) – The 8x8 dimensional covariance matrix of the object state at the previous time step.

返回

Returns the mean vector and covariance: matrix of the predicted state. Unobserved velocities are initialized to 0 mean.

返回类型

(ndarray, ndarray)

project(mean, covariance)[源代码]¶

Project state distribution to measurement space.

参数

mean (ndarray) – The state’s mean vector (8 dimensional array).
covariance (ndarray) – The state’s covariance matrix (8x8 dimensional).

返回

Returns the projected mean and covariance matrix of the given state estimate.

返回类型

(ndarray, ndarray)

track(tracks, bboxes)[源代码]¶

Track forward.

参数

(dict[int (tracks) – dict]): Track buffer.
bboxes (Tensor) – Detected bounding boxes.

返回

dict], Tensor): Updated tracks and bboxes.

返回类型

(dict[int

update(mean, covariance, measurement)[源代码]¶

Run Kalman filter correction step.

参数

mean (ndarray) – The predicted state’s mean vector (8 dimensional).
covariance (ndarray) – The state’s covariance matrix (8x8 dimensional).
measurement (ndarray) – The 4 dimensional measurement vector (x, y, a, h), where (x, y) is the center position, a the aspect ratio, and h the height of the bounding box.

返回

Returns the measurement-corrected state distribution.

返回类型

(ndarray, ndarray)

class mmtrack.models.motion.LinearMotion(num_samples=2, center_motion=False)[源代码]¶

Linear motion while tracking.

参数

num_samples (int, optional) – Number of samples to calculate the velocity. Default to 2.
center_motion (bool, optional) – Whether use center location or bounding box location to estimate the velocity. Default to False.

center(bbox)[源代码]¶: Get the center of the box.

get_velocity(bboxes, num_samples=None)[源代码]¶: Get velocities of the input objects.

step(bboxes, velocity=None)[源代码]¶: Step forward with the velocity.

track(tracks, frame_id)[源代码]¶: Tracking forward.

reid¶

class mmtrack.models.reid.BaseReID(backbone, neck=None, head=None, pretrained=None, train_cfg=None, init_cfg=None)[源代码]¶

Base class for re-identification.

forward_train(img, gt_label, **kwargs)[源代码]¶: “Training forward function.

simple_test(img, **kwargs)[源代码]¶: Test without augmentation.

class mmtrack.models.reid.FcModule(in_channels, out_channels, norm_cfg=None, act_cfg={'type': 'ReLU'}, inplace=True, init_cfg={'layer': 'Linear', 'type': 'Kaiming'})[源代码]¶

Fully-connected layer module.

参数

in_channels (int) – Input channels.
out_channels (int) – Ourput channels.
norm_cfg (dict, optional) – Configuration of normlization method after fc. Defaults to None.
act_cfg (dict, optional) – Configuration of activation method after fc. Defaults to dict(type=’ReLU’).
inplace (bool, optional) – Whether inplace the activatation module.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to dict(type=’Kaiming’, layer=’Linear’).

forward(x, activate=True, norm=True)[源代码]¶: Model forward.

property norm¶: Normalization.

class mmtrack.models.reid.GlobalAveragePooling(kernel_size=None, stride=None)[源代码]¶

Global Average Pooling neck.

Note that we use view to remove extra channel after pooling. We do not use squeeze as it will also remove the batch dimension when the tensor has a batch dimension of size 1, which can lead to unexpected errors.

class mmtrack.models.reid.LinearReIDHead(num_fcs, in_channels, fc_channels, out_channels, norm_cfg=None, act_cfg=None, num_classes=None, loss=None, loss_pairwise=None, topk=(1), init_cfg={'bias': 0, 'layer': 'Linear', 'mean': 0, 'std': 0.01, 'type': 'Normal'})[源代码]¶

Linear head for re-identification.

参数

num_fcs (int) – Number of fcs.
in_channels (int) – Number of channels in the input.
fc_channels (int) – Number of channels in the fcs.
out_channels (int) – Number of channels in the output.
norm_cfg (dict, optional) – Configuration of normlization method after fc. Defaults to None.
act_cfg (dict, optional) – Configuration of activation method after fc. Defaults to None.
num_classes (int, optional) – Number of the identities. Default to None.
loss (dict, optional) – Cross entropy loss to train the re-identificaiton module.
loss_pairwise (dict, optional) – Triplet loss to train the re-identificaiton module.
topk (int, optional) – Calculate topk accuracy. Default to False.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to dict(type=’Normal’,layer=’Linear’, mean=0, std=0.01, bias=0).

forward_train(x)[源代码]¶: Model forward.

loss(gt_label, feats, cls_score=None)[源代码]¶: Compute losses.

roi_heads¶

class mmtrack.models.roi_heads.SelsaBBoxHead(aggregator, *args, **kwargs)[源代码]¶

Selsa bbox head.

This module is proposed in “Sequence Level Semantics Aggregation for Video Object Detection”. SELSA.

参数: aggregator (dict) – Configuration of aggregator.

forward(x, ref_x)[源代码]¶

Computing the cls_score and bbox_pred of the features x of key frame proposals.

参数

x (Tensor) – of shape [N, C, H, W]. N is the number of key frame proposals.
ref_x (Tensor) – of shape [M, C, H, W]. M is the number of reference frame proposals.

返回

The predicted score of classes and the predicted regression offsets.

返回类型

tuple(cls_score, bbox_pred)

class mmtrack.models.roi_heads.SelsaRoIHead(bbox_roi_extractor=None, bbox_head=None, mask_roi_extractor=None, mask_head=None, shared_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[源代码]¶

selsa roi head.

forward_train(x, ref_x, img_metas, proposal_list, ref_proposal_list, gt_bboxes, gt_labels, gt_bboxes_ignore=None, gt_masks=None)[源代码]¶

参数

x (list[Tensor]) – list of multi-level img features.
ref_x (list[Tensor]) – list of multi-level ref_img features.
img_metas (list[dict]) – list of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmdet/datasets/pipelines/formatting.py:Collect.
proposal_list (list[Tensors]) – list of region proposals.
ref_proposal_list (list[Tensors]) – list of region proposals from ref_imgs.
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box
gt_bboxes_ignore (None | list[Tensor]) – specify which bounding boxes can be ignored when computing the loss.
gt_masks (None | Tensor) – true segmentation masks for each box used if the architecture supports a segmentation task.

返回

a dictionary of loss components

返回类型

dict[str, Tensor]

simple_test(x, ref_x, proposals_list, ref_proposals_list, img_metas, proposals=None, rescale=False)[源代码]¶: Test without augmentation.

simple_test_bboxes(x, ref_x, proposals, ref_proposals, img_metas, rcnn_test_cfg, rescale=False)[源代码]¶: Test only det bboxes without augmentation.

class mmtrack.models.roi_heads.SingleRoIExtractor(roi_layer, out_channels, featmap_strides, finest_scale=56, init_cfg=None)[源代码]¶

Extract RoI features from a single level feature map.

This Class is the same as SingleRoIExtractor from mmdet.models.roi_heads.roi_extractors except for using **kwargs to accept external arguments.

forward(feats, rois, roi_scale_factor=None, **kwargs)[源代码]¶: Forward function.

class mmtrack.models.roi_heads.TemporalRoIAlign(num_most_similar_points=2, num_temporal_attention_blocks=4, *args, **kwargs)[源代码]¶

Temporal RoI Align module.

This module is proposed in “Temporal ROI Align for Video Object Recognition”. TRoI Align.

参数

num_most_similar_points (int) – Denotes the number of the most similar points in the Most Similar RoI Align. Defaults to 2.
num_temporal_attention_blocks (int) – Denotes the number of temporal attention blocks in the Temporal Attentional Feature Aggregation. If the value isn’t greater than 0, the averaging operation will be adopted to aggregate the RoI features with the Most Similar RoI features. Defaults to 4.

forward(feats, rois, roi_scale_factor=None, ref_feats=None)[源代码]¶: Forward function.

most_similar_roi_align(roi_feats, ref_feats)[源代码]¶

Extract the Most Similar RoI features from reference feature maps ref_feats based on RoI features roi_feats.

The extraction mainly contains three steps: 1. Compute cos similarity maps between roi_feats and ref_feats. 2. Pick the top K points based on the similarity maps. 3. Project these top K points into reference feature maps ref_feats.

参数

roi_feats (Tensor) – of shape [roi_n, C, roi_h, roi_w]. roi_n, roi_h and roi_w denote the number of key frame proposals, the height of RoI features and the width of RoI features, respectively.
ref_feats (Tensor) – of shape [img_n, C, img_h, img_w]. img_n, img_h and img_w denote the number of reference frames, the height of reference frame feature maps and the width of reference frame feature maps, respectively.

返回

The extracted Most Similar RoI features from reference: feature maps with shape [img_n, roi_n, C, roi_h, roi_w].

返回类型

Tensor

temporal_attentional_feature_aggregation(x, ref_x)[源代码]¶

Aggregate the RoI features x with the Most Similar RoI features ref_x.

The aggregation mainly contains three steps: 1. Pass through a tiny embed network. 2. Use multi-head attention to computing the weight between x and ref_x. 3. Use the normlized (i.e. softmax) weight to weightedly sum x and ref_x.

参数

x (Tensor) – of shape [1, roi_n, C, roi_h, roi_w]. roi_n, roi_h and roi_w denote the number of key frame proposals, the height of RoI features and the width of RoI features, respectively.
ref_x (Tensor) – of shape [img_n, roi_n, C, roi_h, roi_w]. img_n is the number of reference images.

返回

The aggregated Temporal RoI features of key frame: proposals with shape [roi_n, C, roi_h, roi_w].

返回类型

Tensor

track_heads¶

class mmtrack.models.track_heads.CornerPredictorHead(inplanes, channel, feat_size=20, stride=16)[源代码]¶

Corner Predictor head.

参数

inplanes (int) – input channel
channel (int) – the output channel of the first conv block
feat_size (int) – the size of feature map
stride (int) – the stride of feature map from the backbone

forward(x)[源代码]¶

Forward pass with input x.

参数: x (Tensor) – of shape (bs, C, H, W).
返回: bbox of shape (bs, 4) in (tl_x, tl_y, br_x, br_y) format.
返回类型: (Tensor)

get_score_map(x)[源代码]¶

Score map branch.

参数

x (Tensor) – of shape (bs, C, H, W).

返回

of shape (bs, 1, H, W). The score map of top: left corner of tracking bbox.
score_map_br (Tensor): of shape (bs, 1, H, W). The score map of: bottom right corner of tracking bbox.

返回类型

score_map_tl (Tensor)

soft_argmax(score_map)[源代码]¶

Get soft-argmax coordinate for the given score map.

参数

score_map (self.feat_size, self.feat_size) – the last score map in bbox_head branch

返回

of shape (bs, 1). The values are in range: [0, self.feat_size * self.stride]
exp_y (Tensor): of shape (bs, 1). The values are in range: [0, self.feat_size * self.stride]

返回类型

exp_x (Tensor)

class mmtrack.models.track_heads.CorrelationHead(in_channels, mid_channels, out_channels, kernel_size=3, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None, **kwargs)[源代码]¶

Correlation head module.

This module is proposed in “SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks. SiamRPN++.

参数

in_channels (int) – Input channels.
mid_channels (int) – Middle channels.
out_channels (int) – Output channels.
kernel_size (int) – Kernel size of convs. Defaults to 3.
norm_cfg (dict) – Configuration of normlization method after each conv. Defaults to dict(type=’BN’).
act_cfg (dict) – Configuration of activation method after each conv. Defaults to dict(type=’ReLU’).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(kernel, search)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmtrack.models.track_heads.QuasiDenseEmbedHead(embed_channels=256, softmax_temp=- 1, loss_track={'loss_weight': 0.25, 'type': 'MultiPosCrossEntropyLoss'}, loss_track_aux={'hard_mining': True, 'loss_weight': 1.0, 'margin': 0.3, 'sample_ratio': 3, 'type': 'L2Loss'}, init_cfg={'bias': 0, 'distribution': 'uniform', 'layer': 'Linear', 'override': {'bias': 0, 'mean': 0, 'name': 'fc_embed', 'std': 0.01, 'type': 'Normal'}, 'type': 'Xavier'}, *args, **kwargs)[源代码]¶

The quasi-dense roi embed head.

参数

embed_channels (int) – The input channel of embed features. Defaults to 256.
softmax_temp (int) – Softmax temperature. Defaults to -1.
loss_track (dict) – The loss function for tracking. Defaults to MultiPosCrossEntropyLoss.
loss_track_aux (dict) – The auxiliary loss function for tracking. Defaults to L2Loss.

forward(x)[源代码]¶: Forward the input x.

get_targets(gt_match_indices, key_sampling_results, ref_sampling_results)[源代码]¶

Calculate the track targets and track weights for all samples in a batch according to the sampling_results.

参数

(List[obj (ref_sampling_results) – SamplingResults]): Assign results of all images in a batch after sampling.
(List[obj – SamplingResults]): Assign results of all reference images in a batch after sampling.
gt_match_indices (list(Tensor)) – Mapping from gt_instance_ids to ref_gt_instance_ids of the same tracklet in a pair of images.

返回

Association results. Containing the following list of Tensors:

track_targets (list[Tensor]): The mapping instance ids from
all positive proposals in the key image to all proposals in the reference image, each tensor in list has shape (len(key_pos_bboxes), len(ref_bboxes)).

track_weights (list[Tensor]): Loss weights for all positive
proposals in a batch, each tensor in list has shape (len(key_pos_bboxes),).

返回类型

Tuple[list[Tensor]]

loss(dists, cos_dists, targets, weights)[源代码]¶

Calculate the track loss and the auxiliary track loss.

参数

dists (list[Tensor]) – Dot-product dists between key_embeds and ref_embeds.
cos_dists (list[Tensor]) – Cosine dists between key_embeds and ref_embeds.
targets (list[Tensor]) – The mapping instance ids from all positive proposals in the key image to all proposals in the reference image, each tensor in list has shape (len(key_pos_bboxes), len(ref_bboxes)).
weights (list[Tensor]) – Loss weights for all positive proposals in a batch, each tensor in list has shape (len(key_pos_bboxes),).

返回

Tensor]: Calculation results. Containing the following list of Tensors:

loss_track (Tensor): Results of loss_track function.

loss_track_aux (Tensor): Results of loss_track_aux function.

返回类型

Dict [str

match(key_embeds, ref_embeds, key_sampling_results, ref_sampling_results)[源代码]¶

Calculate the dist matrixes for loss measurement.

参数

key_embeds (Tensor) – Embeds of positive bboxes in sampling results of key image.
ref_embeds (Tensor) – Embeds of all bboxes in sampling results of the reference image.
(List[obj (ref_sampling_results) – SamplingResults]): Assign results of all images in a batch after sampling.
(List[obj – SamplingResults]): Assign results of all reference images in a batch after sampling.

返回

Calculation results. Containing the following list of Tensors:

dists (list[Tensor]): Dot-product dists between
key_embeds and ref_embeds, each tensor in list has shape (len(key_pos_bboxes), len(ref_bboxes)).

cos_dists (list[Tensor]): Cosine dists between
key_embeds and ref_embeds, each tensor in list has shape (len(key_pos_bboxes), len(ref_bboxes)).

返回类型

Tuple[list[Tensor]]

class mmtrack.models.track_heads.QuasiDenseTrackHead(*args, **kwargs)[源代码]¶

The quasi-dense track head.

extract_bbox_feats(x, bboxes)[源代码]¶: Extract roi features.

forward_train(x, img_metas, proposal_list, gt_bboxes, gt_labels, gt_match_indices, ref_x, ref_img_metas, ref_proposals, ref_gt_bboxes, ref_gt_labels, gt_bboxes_ignore=None, gt_masks=None, ref_gt_bboxes_ignore=None, ref_gt_mask=None, *args, **kwargs)[源代码]¶

Forward function during training.

Args:
x (list[Tensor]): list of multi-level image features. img_metas (list[dict]): list of image info dict where each dict

has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’.

proposal_list (list[Tensors]): list of region proposals. gt_bboxes (list[Tensor]): Ground truth bboxes of the image,

each item has a shape (num_gts, 4).

gt_labels (list[Tensor]): Ground truth labels of all images.
each has a shape (num_gts,).

gt_match_indices (list(Tensor)): Mapping from gt_instance_ids to
ref_gt_instance_ids of the same tracklet in a pair of images.

ref_x (list[Tensor]): list of multi-level ref_img features. ref_img_metas (list[dict]): list of reference image info dict where

each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’.

ref_proposal_list (list[Tensors]): list of ref_img
region proposals.

ref_gt_bboxes (list[Tensor]): Ground truth bboxes of the
reference image, each item has a shape (num_gts, 4).

ref_gt_labels (list[Tensor]): Ground truth labels of all
reference images, each has a shape (num_gts,).

gt_bboxes_ignore (list[Tensor], None): Ground truth bboxes to be
ignored, each item has a shape (num_ignored_gts, 4).

gt_masks (list[Tensor])Masks for each bbox, has a shape
(num_gts, h , w).

ref_gt_bboxes_ignore (list[Tensor], None): Ground truth bboxes
of reference images to be ignored, each item has a shape (num_ignored_gts, 4).

ref_gt_masks (list[Tensor])Masks for each reference bbox,
has a shape (num_gts, h , w).

返回: Tensor]: Track losses.
返回类型: dict[str

class mmtrack.models.track_heads.RoIEmbedHead(num_convs=0, num_fcs=0, roi_feat_size=7, in_channels=256, conv_out_channels=256, with_avg_pool=False, fc_out_channels=1024, conv_cfg=None, norm_cfg=None, loss_match={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': False}, init_cfg=None, **kwargs)[源代码]¶

The roi embed head.

This module is used in multi-object tracking methods, such as MaskTrack R-CNN.

参数

num_convs (int) – The number of convoluational layers to embed roi features. Defaults to 0.
num_fcs (int) – The number of fully connection layers to embed roi features. Defaults to 0.
roi_feat_size (int|tuple(int)) – The spatial size of roi features. Defaults to 7.
in_channels (int) – The input channel of roi features. Defaults to 256.
conv_out_channels (int) – The output channel of roi features after forwarding convoluational layers. Defaults to 256.
with_avg_pool (bool) – Whether use average pooling before passing roi features into fully connection layers. Defaults to False.
fc_out_channels (int) – The output channel of roi features after forwarding fully connection layers. Defaults to 1024.
conv_cfg (dict) – Config dict for convolution layer. Defaults to None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Defaults to None.
loss_match (dict) – The loss function. Defaults to dict(type=’CrossEntropyLoss’, use_sigmoid=False, loss_weight=1.0)
init_cfg (dict) – Configuration of initialization. Defaults to None.

forward(x, ref_x, num_x_per_img, num_x_per_ref_img)[源代码]¶

Computing the similarity scores between x and ref_x.

参数

x (Tensor) – of shape [N, C, H, W]. N is the number of key frame proposals.
ref_x (Tensor) – of shape [M, C, H, W]. M is the number of reference frame proposals.
num_x_per_img (list[int]) – The x contains proposals of multi-images. num_x_per_img denotes the number of proposals for each key image.
num_x_per_ref_img (list[int]) – The ref_x contains proposals of multi-images. num_x_per_ref_img denotes the number of proposals for each reference image.

返回

The predicted similarity_logits of each pair of key image and reference image.

返回类型

list[Tensor]

get_targets(sampling_results, gt_instance_ids, ref_gt_instance_ids)[源代码]¶

Calculate the ground truth for all samples in a batch according to the sampling_results.

参数

(List[obj (sampling_results) – SamplingResults]): Assign results of all images in a batch after sampling.
gt_instance_ids (list[Tensor]) – The instance ids of gt_bboxes of all images in a batch, each tensor has shape (num_gt, ).
ref_gt_instance_ids (list[Tensor]) – The instance ids of gt_bboxes of all reference images in a batch, each tensor has shape (num_gt, ).

返回

Ground truth for proposals in a batch. Containing the following list of Tensors:

track_id_targets (list[Tensor]): The instance ids of Gt_labels for all proposals in a batch, each tensor in list has shape (num_proposals,).

track_id_weights (list[Tensor]): Labels_weights for all proposals in a batch, each tensor in list has shape (num_proposals,).

返回类型

Tuple[list[Tensor]]

loss(similarity_logits, track_id_targets, track_id_weights, reduction_override=None)[源代码]¶

Calculate the loss in a batch.

参数

similarity_logits (list[Tensor]) – The predicted similarity_logits of each pair of key image and reference image.
track_id_targets (list[Tensor]) – The instance ids of Gt_labels for all proposals in a batch, each tensor in list has shape (num_proposals,).
track_id_weights (list[Tensor]) – Labels_weights for all proposals in a batch, each tensor in list has shape (num_proposals,).
reduction_override (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”.

返回

a dictionary of loss components.

返回类型

dict[str, Tensor]

class mmtrack.models.track_heads.RoITrackHead(roi_extractor=None, embed_head=None, regress_head=None, train_cfg=None, test_cfg=None, init_cfg=None, *args, **kwargs)[源代码]¶

The roi track head.

This module is used in multi-object tracking methods, such as MaskTrack R-CNN.

参数

roi_extractor (dict) – Configuration of roi extractor. Defaults to None.
embed_head (dict) – Configuration of embed head. Defaults to None.
train_cfg (dict) – Configuration when training. Defaults to None.
test_cfg (dict) – Configuration when testing. Defaults to None.
init_cfg (dict) – Configuration of initialization. Defaults to None.

extract_roi_feats(x, bboxes)[源代码]¶: Extract roi features.

forward_train(x, ref_x, img_metas, proposal_list, gt_bboxes, ref_gt_bboxes, gt_labels, gt_instance_ids, ref_gt_instance_ids, gt_bboxes_ignore=None, **kwargs)[源代码]¶

参数

x (list[Tensor]) – list of multi-level image features.
ref_x (list[Tensor]) – list of multi-level ref_img features.
img_metas (list[dict]) – list of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmtrack/datasets/pipelines/formatting.py:VideoCollect.
proposal_list (list[Tensors]) – list of region proposals.
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
ref_gt_bboxes (list[Tensor]) – Ground truth bboxes for each reference image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box.
gt_instance_ids (None | list[Tensor]) – specify the instance id for each ground truth bbox.
ref_gt_instance_ids (None | list[Tensor]) – specify the instance id for each ground truth bbox of reference images.
gt_bboxes_ignore (None | list[Tensor]) – specify which bounding boxes can be ignored when computing the loss.

返回

a dictionary of loss components

返回类型

dict[str, Tensor]

init_assigner_sampler()[源代码]¶: Initialize assigner and sampler.

init_embed_head(roi_extractor, embed_head)[源代码]¶: Initialize embed_head

simple_test(roi_feats, prev_roi_feats)[源代码]¶: Test without augmentations.

property with_track¶

whether the mulit-object tracker has a embed head

Type: bool

class mmtrack.models.track_heads.SiameseRPNHead(anchor_generator, in_channels, kernel_size=3, norm_cfg={'type': 'BN'}, weighted_sum=False, bbox_coder={'target_means': [0.0, 0.0, 0.0, 0.0], 'target_stds': [1.0, 1.0, 1.0, 1.0], 'type': 'DeltaXYWHBBoxCoder'}, loss_cls={'loss_weight': 1.0, 'reduction': 'sum', 'type': 'CrossEntropyLoss'}, loss_bbox={'loss_weight': 1.2, 'reduction': 'sum', 'type': 'L1Loss'}, train_cfg=None, test_cfg=None, init_cfg=None, *args, **kwargs)[源代码]¶

Siamese RPN head.

This module is proposed in “SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks. SiamRPN++.

参数

anchor_generator (dict) – Configuration to build anchor generator module.
in_channels (int) – Input channels.
kernel_size (int) – Kernel size of convs. Defaults to 3.
norm_cfg (dict) – Configuration of normlization method after each conv. Defaults to dict(type=’BN’).
weighted_sum (bool) – If True, use learnable weights to weightedly sum the output of multi heads in siamese rpn , otherwise, use averaging. Defaults to False.
bbox_coder (dict) – Configuration to build bbox coder. Defaults to dict(type=’DeltaXYWHBBoxCoder’, target_means=[0., 0., 0., 0.], target_stds=[1., 1., 1., 1.]).
loss_cls (dict) – Configuration to build classification loss. Defaults to dict( type=’CrossEntropyLoss’, reduction=’sum’, loss_weight=1.0)
loss_bbox (dict) – Configuration to build bbox regression loss. Defaults to dict( type=’L1Loss’, reduction=’sum’, loss_weight=1.2).
train_cfg (Dict) – Training setting. Defaults to None.
test_cfg (Dict) – Testing setting. Defaults to None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(z_feats, x_feats)[源代码]¶

Forward with features z_feats of exemplar images and features x_feats of search images.

参数

z_feats (tuple[Tensor]) – Tuple of Tensor with shape (N, C, H, W) denoting the multi level feature maps of exemplar images. Typically H and W equal to 7.
x_feats (tuple[Tensor]) – Tuple of Tensor with shape (N, C, H, W) denoting the multi level feature maps of search images. Typically H and W equal to 31.

返回

cls_score is a Tensor with shape (N, 2 * num_base_anchors, H, W), bbox_pred is a Tensor with shape (N, 4 * num_base_anchors, H, W), Typically H and W equal to 25.

返回类型

tuple(cls_score, bbox_pred)

get_bbox(cls_score, bbox_pred, prev_bbox, scale_factor)[源代码]¶

Track prev_bbox to current frame based on the output of network.

参数

cls_score (Tensor) – of shape (1, 2 * num_base_anchors, H, W).
bbox_pred (Tensor) – of shape (1, 4 * num_base_anchors, H, W).
prev_bbox (Tensor) – of shape (4, ) in [cx, cy, w, h] format.
scale_factor (Tensr) – scale factor.

返回

best_score is a Tensor denoting the score of best_bbox, best_bbox is a Tensor of shape (4, ) with [cx, cy, w, h] format, which denotes the best tracked bbox in current frame.

返回类型

tuple(best_score, best_bbox)

get_targets(gt_bboxes, score_maps_size, is_positive_pairs)[源代码]¶

Generate the training targets for exemplar image and search image pairs.

参数

gt_bboxes (list[Tensor]) – Ground truth bboxes of each search image with shape (1, 5) in [0.0, tl_x, tl_y, br_x, br_y] format.
score_maps_size (torch.size) – denoting the output size (height, width) of the network.
is_positive_pairs (bool) – list of bool denoting whether each ground truth bbox in gt_bboxes is positive.

返回

tuple(all_labels, all_labels_weights, all_bbox_targets, all_bbox_weights): the shape is (N, H * W * num_base_anchors), (N, H * W * num_base_anchors), (N, H * W * num_base_anchors, 4), (N, H * W * num_base_anchors, 4), respectively. All of them are Tensor.

loss(cls_score, bbox_pred, labels, labels_weights, bbox_targets, bbox_weights)[源代码]¶

Compute loss.

参数

cls_score (Tensor) – of shape (N, 2 * num_base_anchors, H, W).
bbox_pred (Tensor) – of shape (N, 4 * num_base_anchors, H, W).
labels (Tensor) – of shape (N, H * W * num_base_anchors).
labels_weights (Tensor) – of shape (N, H * W * num_base_anchors).
bbox_targets (Tensor) – of shape (N, H * W * num_base_anchors, 4).
bbox_weights (Tensor) – of shape (N, H * W * num_base_anchors, 4).

返回

a dictionary of loss components

返回类型

dict[str, Tensor]

class mmtrack.models.track_heads.StarkHead(num_query=1, transformer=None, positional_encoding={'normalize': True, 'num_feats': 128, 'type': 'SinePositionalEncoding'}, bbox_head=None, cls_head=None, loss_cls={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': False}, loss_bbox={'loss_weight': 5.0, 'type': 'L1Loss'}, loss_iou={'loss_weight': 2.0, 'type': 'GIoULoss'}, train_cfg=None, test_cfg=None, init_cfg=None, frozen_modules=None, **kwargs)[源代码]¶

STARK head module for bounding box regression and prediction of confidence score of tracking bbox.

This module is proposed in “Learning Spatio-Temporal Transformer for Visual Tracking”. STARK.

参数

num_query (int) – Number of query in transformer.
(obj (test_cfg) – `mmcv.ConfigDict`|dict): Config for transformer. Default: None.
(obj – `mmcv.ConfigDict`|dict): Config for position encoding.
(obj – `mmcv.ConfigDict`|dict, optional): Config for bbox head. Defaults to None.
(obj – `mmcv.ConfigDict`|dict, optional): Config for classification head. Defaults to None.
(obj – mmcv.ConfigDict`|dict): Config of the classification loss. Default `CrossEntropyLoss.
(obj – mmcv.ConfigDict`|dict): Config of the bbox regression loss. Default `L1Loss.
(obj – mmcv.ConfigDict`|dict): Config of the bbox regression iou loss. Default `GIoULoss.
(obj – `mmcv.ConfigDict`|dict): Training config of transformer head.
(obj – `mmcv.ConfigDict`|dict): Testing config of transformer head.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(inputs)[源代码]¶

” :param inputs: The list contains the

multi-level features and masks of template or search images.

‘feat’: (tuple(Tensor)), the Tensor is of shape
(bs, c, h//stride, w//stride).

‘mask’: (Tensor), of shape (bs, h, w).

Here, h and w denote the height and width of input image respectively. stride is the stride of feature map.

返回

‘pred_bboxes’: (Tensor) of shape (bs, num_query, 4), in
[tl_x, tl_y, br_x, br_y] format
’pred_logit’: (Tensor) of shape (bs, num_query, 1)

返回类型

(dict)

forward_bbox_head(feat, enc_mem)[源代码]¶

参数

feat – output embeddings of decoder, with shape (1, bs, num_query, c).
enc_mem –
output embeddings of encoder, with shape (feats_flatten_len, bs, C)

Here, ‘feats_flatten_len’ = z_feat_h*z_feat_w*2 + x_feat_h*x_feat_w. ‘z_feat_h’ and ‘z_feat_w’ denote the height and width of the template features respectively. ‘x_feat_h’ and ‘x_feat_w’ denote the height and width of search features respectively.

返回

of shape (bs, num_query, 4). The bbox is in: [tl_x, tl_y, br_x, br_y] format.

返回类型

Tensor

init_weights()[源代码]¶: Parameters initialization.

loss(track_results, gt_bboxes, gt_labels, img_size=None)[源代码]¶

Compute loss.

参数

track_results (dict) –
it may contains the following keys: - ‘pred_bboxes’: bboxes of (N, num_query, 4) shape in

[tl_x, tl_y, br_x, br_y] format.
- ’pred_logits’: bboxes of (N, num_query, 1) shape.
gt_bboxes (list[Tensor]) – ground truth bboxes for search images with shape (N, 5) in [0., tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – ground truth labels for search images with shape (N, 2).
img_size (tuple, optional) – the size (h, w) of original search image. Defaults to None.

返回

a dictionary of loss components.

返回类型

dict[str, Tensor]

builder¶

mmtrack.models.build_aggregator(cfg)[源代码]¶: Build aggregator model.

mmtrack.models.build_model(cfg, train_cfg=None, test_cfg=None)[源代码]¶: Build model.

mmtrack.models.build_motion(cfg)[源代码]¶: Build motion model.

mmtrack.models.build_reid(cfg)[源代码]¶: Build reid model.

mmtrack.models.build_tracker(cfg)[源代码]¶: Build tracker.

mmtrack.utils¶

mmtrack.utils.collect_env()[源代码]¶: Collect the information of the running environments.

mmtrack.utils.get_root_logger(log_file=None, log_level=20)[源代码]¶

Get root logger.

参数

log_file (str) – File path of log. Defaults to None.
log_level (int) – The level of logger. Defaults to logging.INFO.

返回

The obtained logger

返回类型

logging.Logger