SMAC: An Interpretable Reasoning Network for Visual Question Answering

Minghan Wang,Hao Yang,Shiliang Sun,Yao Deng,Ying Qin
2019-01-01
Abstract:In the visual question answering (VQA) task, many successful works focus on building end-to-end predictive models, but the interpretability of the reasoning process is ignored, which is however very important for evaluating the trustworthiness of the model. The recent MAC-network (Hudson and Manning 2018) achieves state-of-the-art results on the VQA task which demonstrates the effectiveness of differentiable reasoning models. However, for MAC, interpreting the reasoning process by visualizing the attention map often fails to clearly show the logic of multi-step reasoning. In this paper, we propose SMAC (Symbolic MAC) to improve the interpretability in the following points. (1) Intent classification is introduced to make the question understanding explainable. (2) We propose the Translate Unit (TU) to translate the reasoning process into the formalized query language for interpreting, as well as providing explicit guidance on the reasoning cell in the training phase. We further enlarge the feature space to leverage more information by incorporating the image pixel features and the object-specific features simultaneously, which follows the multi-view learning framework. Experiments demonstrate that SMAC is able to achieve competitive performance on a large-scale and realistic GQA (Hudson and Manning 2019) benchmark and show well interpretability evidence with symbolic intermediate outcomes.
What problem does this paper attempt to address?