DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression

Zongwen Bai,Ying Li,Marcin Woźniak,Meili Zhou,Di Li
DOI: https://doi.org/10.1016/j.patcog.2020.107538
IF: 8
2021-02-01
Pattern Recognition
Abstract:<p>The model we developed is a novel comprehensive solution to compress and accelerate the Visual Question Answering systems. In our algorithm Convolutional Neural Network is compressed with Long Short Term Memory to accelerate processing simultaneously. We propose to conduct various decomposition methods and regression strategies on different layers, including Canonical Polyadic, Tucker, and Tensor Train to decompose Fully Connected layers in CNN and LSTM. The Flattening Layer and Fully Connected layer at the end of the model are replaced with Tensor Regression layers. In order to compress the parameter further, the feature flow between the layers is compressed by Tensor Contraction layer. The proposed tensor decomposition model was evaluated on VQA 2.0 dataset with Pythia as baseline model. Our proposed model achieved from 77% to 91% of compression ratio, and only from 1% to 5% accuracy drop.</p>
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?