CCT: A Cross-Concat and Temporal Neural Network for Multi-Label Action Unit Detection

Qiaoping Hu,Fei Jiang,Chuanneng Mei,Ruimin Shen
DOI: https://doi.org/10.1109/ICME.2018.8486516
2018-01-01
Abstract:Action Unit (AU) detection is essential for facial expression analysis. However, most existing AU detection algorithms only focus on physical features, e.g., temporal feature and AU correlations, without considering various distributions of AUs, i.e., some AUs are quite less than others. In this work, we propose a novel cross-concat and temporal (CCT) neural network, which simultaneously consider physical features and the distribution differences. First, we design a cross-concat block (CCB) to adapt to the various distributions of AUs. CCB is based on the idea of skip connections since skip connections can reuse features from different layers and capture abundant features of AUs even with relatively small-size training samples. Second, LSTM layers are utilized to capture the temporal dependencies and multi-label learning is utilized for capturing AU correlations. Experimental results on three popular AU detection datasets, BP4D, DISFA, and GFT, show that the proposed algorithm outperforms the state-of-the-art ones.
What problem does this paper attempt to address?