What problem does this paper attempt to address?

The problem that this paper attempts to solve is about how to evaluate the performance of RepNet in the video repetition counting task. Specifically, the paper points out the consistency problem in the performance evaluation of RepNet in different literatures, and emphasizes that some literatures use a modified RepNet model, which has led to misunderstandings of its performance. The author of the paper points out that the original RepNet model can handle predictions with a cycle length of more than 32 through the multi - speed evaluation technique without modifying the model. To clarify these confusions, the author reports the performance results of RepNet on different datasets and releases the evaluation code and RepNet checkpoints. ### Main problem points: 1. **Evaluation consistency problem**: Multiple literatures have reported the insufficient performance of RepNet on some repetition - counting datasets, but what are often used in these reports are modified RepNet models, not the original model. 2. **Impact of model modification**: For example, in the TransRAC paper, it is mentioned that for a fair comparison, they modified the last fully - connected layer of RepNet so that it can handle videos with more than 32 action cycles. This modification has led to the performance of the modified RepNet model being close to zero on the UCFRep and RepCount datasets. 3. **Multi - speed evaluation technique**: In the original RepNet paper, a multi - speed evaluation technique was proposed. By changing the video playback speed, longer - cycle predictions can be handled without modifying the model. This method has also been verified on the Countix dataset. ### Solutions: - **Re - evaluate RepNet**: The author re - evaluated the performance of the original RepNet model on the Countix, UCFRep, and RepCount - A datasets using the multi - speed evaluation technique and reported detailed performance metrics. - **Release evaluation code and model**: For transparency and reproducibility, the author released the code used for evaluation and the checkpoints of the RepNet model. ### Conclusion: Through re - evaluation, the author found that the original RepNet model still performs excellently in the video repetition counting task, and even outperforms more modern methods on some datasets. This shows that although many new models and larger backbone networks have emerged in recent years, the RepNet model trained in 2020 is still highly competitive at low resolutions. It is hoped that the community can use the released model to further improve the repetition - counting method.

A Short Note on Evaluating RepNet for Temporal Repetition Counting in Videos

RECL: Responsive Resource-Efficient Continuous Learning for Video Analytics

OVR: A Dataset for Open Vocabulary Temporal Repetition Counting in Videos

Revisiting Temporal Modeling for Video-based Person ReID

Temporal receptive field in dynamic graph learning: A comprehensive analysis

DyRep: Bootstrapping Training with Dynamic Re-parameterization

Attention-guided Temporally Coherent Video Object Matting

RepeatNet: A Repeat Aware Neural Recommendation Machine for Session-based Recommendation

Evaluating Temporal Persistence Using Replicability Measures

New Perspectives on the Evaluation of Link Prediction Algorithms for Dynamic Graphs

Frame by Familiar Frame: Understanding Replication in Video Diffusion Models

Repetition Estimation

A Benchmark and Empirical Analysis for Replay Strategies in Continual Learning

A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric

Rethinking the Evaluation of Video Summaries

Datasets for Paper "benchtemp: A General Benchmark for Evaluating Temporal Graph Neural Networks"

Look At Me, No Replay! SurpriseNet: Anomaly Detection Inspired Class Incremental Learning

Context-Aware and Scale-Insensitive Temporal Repetition Counting

Boosting Video Super Resolution with Patch-Based Temporal Redundancy Optimization

ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos

Does SpatioTemporal information benefit Two video summarization benchmarks?