Video Action Recognition with Neural Architecture Search.

Yuanding Zhou,Baopu Li,Zhihui Wang,Haojie Li
2021-01-01
Abstract:Recently, deep convolutional neural networks have been widely used in the field of video action recognition. Current approaches tend to concentrate on the structure design for different backbone networks, but what kind of network structures can process video both effectively and quickly still remains to be solved despite the encouraging progress. With the help of neural architecture search (NAS), we search for three hyperparameters in the video processing network, which are the number of frames, the number of layers per residual stage and the channel number for all layers. We relax the entire search space into a continuous search space, and search for a set of network architectures that balance accuracy and computational efficiency by considering accuracy as the primary optimization goal and computational complexity as the secondary optimization goal. We conduct experiments on UCF101 and Kinetics400 datasets, validating new state-of-the-art results of the proposed NAS based scheme for video action recognition.
What problem does this paper attempt to address?