Abstract:Temporal forward-tracking has been the dominant approach for multi-object segmentation and tracking (MOTS). However, a novel time-symmetric tracking methodology has recently been introduced for the detection, segmentation, and tracking of budding yeast cells in pre-recorded samples. Although this architecture has demonstrated a unique perspective on stable and consistent tracking, as well as missed instance re-interpolation, its evaluation has so far been largely confined to settings related to videomicroscopic environments. In this work, we aim to reveal the broader capabilities, advantages, and potential challenges of this architecture across various specifically designed scenarios, including a pedestrian tracking dataset. We also conduct an ablation study comparing the model against its restricted variants and the widely used Kalman filter. Furthermore, we present an attention analysis of the tracking architecture for both pretrained and non-pretrained models
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate and expand the performance and capabilities of the time - symmetric multi - object tracking (TS) architecture in different scenarios. Specifically, the paper mainly focuses on the following aspects:
1. **Evaluating the wide applicability of the TS architecture**: Although the TS architecture has performed excellently in multi - object segmentation and tracking tasks in microscopic video environments such as budding yeast cells, its performance on other types of datasets has not been fully explored. Therefore, the author hopes to reveal the potential, advantages, and potential challenges of this architecture in different scenarios.
2. **Comparison with traditional methods**: The paper evaluates the performance of the TS architecture in different tasks by comparing it with the widely - used Kalman filter and restricted variants of the TS architecture (such as TS - L2 and TS - Shape). This helps to understand the unique features of the TS architecture and its advantages over traditional methods.
3. **Zero - shot knowledge transfer evaluation**: The researchers also evaluate the zero - shot knowledge transfer ability of the TS architecture between the synthetic dataset (MOTSynth - MOTS - CVPR22) and the real - world pedestrian tracking dataset (MOTS). This evaluation is very important for understanding the generalization ability of the model and its ability to adapt to new environments.
4. **Attention analysis of local tracking segments**: In order to further understand the working principle of the TS architecture, the author conducts an attention analysis to explore the spatio - temporal attention preferences of pre - trained and non - pre - trained models in local tracking segments. This helps to reveal the focus of the model when dealing with different tasks.
### Formula Summary
Some of the key formulas involved in the paper are as follows:
- **IoU 50% Binary Metric**:
\[
\text{IoU}(GT(t, n), PD(t, m))=\frac{|GT(t, n)\cap PD(t, m)|}{|GT(t, n)\cup PD(t, m)|}
\]
\[
\text{IoU}_{50}(GT(t, n), PD(t, m)) =
\begin{cases}
1 & \text{if }\text{IoU}(GT(t, n), PD(t, m))> 0.5\\
0 & \text{if }\text{IoU}(GT(t, n), PD(t, m))\leq 0.5
\end{cases}
\]
- **True Positive Association Count (TPA 50)**:
\[
TPA_{50}(t, t + 1)=\sum_{n,m}\left[\text{IoU}_{50}(GT(t, n), PD(t, m))\cap\text{IoU}_{50}(GT(t + 1, n), PD(t + 1, m))\right]
\]
- **False Positive Association Count (FPA 50)**:
\[
FPA_{50}(t, t + 1)=|PDD_{50}(t, t + 1)|-TPA_{50}(t, t + 1)
\]
- **False Negative Association Count (FNA 50)**:
\[
FNA_{50}(t, t + 1)=|GTD_{50}(t, t + 1)|-TPA_{50}(t, t + 1)
\]
- **Association Precision (AP50)**:
\[
AP_{50}=\frac{TPA_{50}}{TPA_{50}+FPA_{50}}
\]
- **Association Recall (AR50)**:
\[
AR_{50}=\frac{TPA_{50}}{TPA_{50}+FNA_{50}}
\]
- **Association F - score (AF50)**:
\[
AF_{50}=\frac{2\cdot AP_{50}\cdot AR_{50}}{AP_{50}+AR_{50}}