Abstract:Temporal forward-tracking has been the dominant approach for multi-object segmentation and tracking (MOTS). However, a novel time-symmetric tracking methodology has recently been introduced for the detection, segmentation, and tracking of budding yeast cells in pre-recorded samples. Although this architecture has demonstrated a unique perspective on stable and consistent tracking, as well as missed instance re-interpolation, its evaluation has so far been largely confined to settings related to videomicroscopic environments. In this work, we aim to reveal the broader capabilities, advantages, and potential challenges of this architecture across various specifically designed scenarios, including a pedestrian tracking dataset. We also conduct an ablation study comparing the model against its restricted variants and the widely used Kalman filter. Furthermore, we present an attention analysis of the tracking architecture for both pretrained and non-pretrained models

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate and expand the performance and capabilities of the time - symmetric multi - object tracking (TS) architecture in different scenarios. Specifically, the paper mainly focuses on the following aspects: 1. **Evaluating the wide applicability of the TS architecture**: Although the TS architecture has performed excellently in multi - object segmentation and tracking tasks in microscopic video environments such as budding yeast cells, its performance on other types of datasets has not been fully explored. Therefore, the author hopes to reveal the potential, advantages, and potential challenges of this architecture in different scenarios. 2. **Comparison with traditional methods**: The paper evaluates the performance of the TS architecture in different tasks by comparing it with the widely - used Kalman filter and restricted variants of the TS architecture (such as TS - L2 and TS - Shape). This helps to understand the unique features of the TS architecture and its advantages over traditional methods. 3. **Zero - shot knowledge transfer evaluation**: The researchers also evaluate the zero - shot knowledge transfer ability of the TS architecture between the synthetic dataset (MOTSynth - MOTS - CVPR22) and the real - world pedestrian tracking dataset (MOTS). This evaluation is very important for understanding the generalization ability of the model and its ability to adapt to new environments. 4. **Attention analysis of local tracking segments**: In order to further understand the working principle of the TS architecture, the author conducts an attention analysis to explore the spatio - temporal attention preferences of pre - trained and non - pre - trained models in local tracking segments. This helps to reveal the focus of the model when dealing with different tasks. ### Formula Summary Some of the key formulas involved in the paper are as follows: - **IoU 50% Binary Metric**: \[ \text{IoU}(GT(t, n), PD(t, m))=\frac{|GT(t, n)\cap PD(t, m)|}{|GT(t, n)\cup PD(t, m)|} \] \[ \text{IoU}_{50}(GT(t, n), PD(t, m)) = \begin{cases} 1 & \text{if }\text{IoU}(GT(t, n), PD(t, m))> 0.5\\ 0 & \text{if }\text{IoU}(GT(t, n), PD(t, m))\leq 0.5 \end{cases} \] - **True Positive Association Count (TPA 50)**: \[ TPA_{50}(t, t + 1)=\sum_{n,m}\left[\text{IoU}_{50}(GT(t, n), PD(t, m))\cap\text{IoU}_{50}(GT(t + 1, n), PD(t + 1, m))\right] \] - **False Positive Association Count (FPA 50)**: \[ FPA_{50}(t, t + 1)=|PDD_{50}(t, t + 1)|-TPA_{50}(t, t + 1) \] - **False Negative Association Count (FNA 50)**: \[ FNA_{50}(t, t + 1)=|GTD_{50}(t, t + 1)|-TPA_{50}(t, t + 1) \] - **Association Precision (AP50)**: \[ AP_{50}=\frac{TPA_{50}}{TPA_{50}+FPA_{50}} \] - **Association Recall (AR50)**: \[ AR_{50}=\frac{TPA_{50}}{TPA_{50}+FNA_{50}} \] - **Association F - score (AF50)**: \[ AF_{50}=\frac{2\cdot AP_{50}\cdot AR_{50}}{AP_{50}+AR_{50}}

Post-Hoc MOTS: Exploring the Capabilities of Time-Symmetric Multi-Object Tracking

Exploit the Connectivity: Multi-Object Tracking with TrackletNet

Exploit the Connectivity

MAT: Motion-Aware Multi-Object Tracking

MOTS: Multi-Object Tracking and Segmentation

Multi-Object Tracking and Segmentation with a Space-Time Memory Network

Spatial-Semantic and Temporal Attention Mechanism-Based Online Multi-Object Tracking

PointTrack++ for Effective Online Multi-Object Tracking and Segmentation

Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking

Towards Real-Time Multi-Object Tracking

Segment as Points for Efficient and Effective Online Multi-Object Tracking and Segmentation

Application of SORT on Multi-Object Tracking and Segmentation

TPTrack: Strengthening tracking-by-detection methods from tracklet processing perspectives

SimpleTrackV2: Rethinking the Timing Characteristics for Multi-Object Tracking

SearchTrack: Multiple Object Tracking with Object-Customized Search and Motion-Aware Features

MOTR: End-to-End Multiple-Object Tracking with Transformer

Online Multi-Object Tracking and Segmentation with GMPHD Filter and Mask-based Affinity Fusion

SMOT: Single-Shot Multi Object Tracking

MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking

DIOR - DIstill Observations to Representations for Multi-Object Tracking and Segmentation.

Multi-object tracking with adaptive measurement noise and information fusion