TLS-RWKV: Real-Time Online Action Detection with Temporal Label Smoothing
Ziqi Zhu,Wuchang Shao,Dongdong Jiao
DOI: https://doi.org/10.1007/s11063-024-11540-0
IF: 2.565
2024-02-20
Neural Processing Letters
Abstract:Online action detection (OAD)is a challenging task that involves predicting the ongoing action class in real-time streaming videos, which is essential in the field of autonomous driving and video surveillance. In this article, we propose an approach for OAD based on the Receptance Weighted Key Value (RWKV) model with temporal label smooth. The RWKV model captures temporal dependencies and computes efficiently at the same time, which makes it well-suited for real-time applications. Our TLS-RWKV model demonstrates advancements in two aspects. First, we conducted experiments on two widely used datasets, THUMOS'14 and TVSeries. Our proposed approach demonstrates state-of-the-art performance with 71.8% mAP on THUMOS'14 and 89.7% cAP on TVSeries. Second, our proposed approach demonstrates impressive efficiency, running at over 600 FPS and maintaining a competitive mAP of 59.9% on THUMOS'14 with RGB features alone. Notably, this efficiency surpasses the prior state-of-the-art model, TesTra, by more than two times. Even when executed on a CPU, our model maintains a commendable speed, exceeding 200 FPS. This high efficiency makes our model suitable for real-time deployment, even on resource-constrained devices. These results showcase the effectiveness and competitiveness of our proposed approach in OAD.
computer science, artificial intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve?
This paper aims to solve several key challenges in **Online Action Detection (OAD)**. Specifically, the author proposes a new method to achieve real - time online action detection and pays special attention to the following issues:
1. **Real - time requirements**:
- Online action detection needs to predict the ongoing action category in real - time in the video stream, which places extremely high demands on computational efficiency. Especially in fields such as autonomous driving and video surveillance, real - time response is crucial.
2. **Long - term dependency capture**:
- The action detection task usually needs to capture long - term context information to accurately identify the start and end of an action. However, existing models such as RNN and Transformer have limitations when dealing with long - sequence data. RNN is difficult to parallelize training and is prone to the vanishing gradient problem, while Transformer can handle long - range dependencies but has too high computational complexity in real - time applications.
3. **Label smoothing**:
- In online action detection, action boundaries are often not clear enough, causing the model to be error - prone in areas close to the action boundaries. The traditional label assignment method usually assigns the label of a specific time point to the entire segment, which is not refined enough and may introduce noise.
To solve these problems, the author proposes a new method based on the **RWKV model** and introduces the **Temporal Label Smoothing (TLS)** technique. This method not only improves the performance of the model but also significantly enhances the computational efficiency, enabling it to be deployed in real - time on resource - constrained devices.
### Specific contributions:
- **Application of the RWKV model**: The RWKV model combines the long - range dependency capture ability of Transformer and the efficient inference characteristics of RNN, and is suitable for real - time online action detection.
- **Temporal Label Smoothing (TLS)**: By introducing the temporal label smoothing technique, the label assignment method is improved, reducing the fuzziness and uncertainty near the action boundaries.
- **Experimental verification**: Experiments were carried out on two commonly used datasets, THUMOS'14 and TVSeries, demonstrating the superiority of this method in performance and efficiency.
These improvements enable the TLS - RWKV model to run at a speed of over 600 FPS while maintaining high precision, and can even reach a speed of over 200 FPS on the CPU, which is suitable for applications in low - resource environments such as edge computing.