DFAformer: A Dual Filtering Auxiliary Transformer for Efficient Online Action Detection in Streaming Videos.

Shicheng Jing,Liping Xie
DOI: https://doi.org/10.1007/978-981-99-8537-1_11
2024-01-01
Abstract:Online action detection (OAD) aims to identify the specific type of ongoing action frame by frame without future information. The full exploration of historical memory with limited yet redundant information constraints for potential patterns thus becomes an important yet challenging problem. We propose a novel transformer-based framework called Dual Filtering Auxiliary Transformer (DFAformer) to achieve this goal. In DFAformer, a two-stage filtering mechanism filters impurities related to background and uninterested actions in the historical memory at the frame and element levels. To make the model concentrate on the ongoing action, we elaborate an auxiliary task, Jaccard Summary Unit, explicitly correlates the past with the future. This auxiliary task guide the learning of model weights without extra computational costs during inference. Experiments on three real-world benchmark datasets demonstrate the superiority of the proposed method.
What problem does this paper attempt to address?