Pyramid Dilated Attention Network for Action Segmentation

Zexing Du,Feng Mei,Xiaohan Lai,Qing Wang
DOI: https://doi.org/10.1109/wcsp52459.2021.9613432
2021-01-01
Abstract:Action segmentation has been widely studied with the development of temporal convolution networks. However, the correlations of frames with different time intervals are still not well explored. Especially, in untrimmed videos, frames always play different roles. Consecutive frames can provide local spatiotemporal information, and distant frames provide global information. Therefore, applying attention in the same dimension cannot exploit such differences. In addition, untrimmed videos generally contain thousands of frames, and directly applying attention to the whole video would be computationally heavy and inefficient. In this paper, we propose a dilated attention module (DAM), which builds attention maps in a dilated manner, rather than on the whole sequence. To explore correlations between frames with different intervals, we propose a pyramid dilated attention network (PDAN), which uses higher dimension features to exploit the relationships with a short interval to get the local information and uses lower dimension features to study the correlations with a long interval to explore the global information. When MS-TCN is equipped with the PDAN, the state-of-the-art performance is achieved on three challenging datasets.
What problem does this paper attempt to address?