Conditional Kronecker Batch Normalization for Compositional Reasoning.

Cheng Shi,Chun Yuan,Jiayin Cai,Zhuobin Zheng,Yangyang Cheng,Zhihui Lin
2018-01-01
Abstract:Conditional Batch Normalization (CBN) has proved to be an effective tool for visual question answering. However, previous CBN approaches fuse the linguistic information into image features via a simple affine transformation, thus they have struggled on compositional reasoning and object counting in images. In this paper, we propose a novel CBN method using the Kronecker transformation, termed as Conditional Kronecker Batch Normalization (CKBN). CKBN layer facilitates the explicit and expressive learning of compositional reasoning and robust counting in original images. Besides, we demonstrate that the Kronecker transformation in CKBN layer is a generalization of the affine transformation in prior CBN approaches. It could accelerate the fusion of visual and linguistic information, and thus the convergence of overall model. Experiment results show that our model significantly outperforms previous CBN methods (e.g. FiLM) in compositional reasoning, counting as well as the convergence speed on CLEVR dataset.
What problem does this paper attempt to address?