Fusion that matters: convolutional fusion networks for visual recognition

Yu Liu,Yanming Guo,Theodoros Georgiou,Michael S. Lew
DOI: https://doi.org/10.1007/s11042-018-5691-4
2018-01-01
Abstract:In recent years, deep learning has been successfully applied to diverse multimedia research areas, with the aim of learning powerful and informative representations for a variety of visual recognition tasks. In this work, we propose convolutional fusion networks (CFN) to integrate multi-level deep features and fuse a richer visual representation. Despite recent advances in deep fusion networks, they still have limitations due to expensive parameters and weak fusion modules. Instead, CFN uses 1 × 1 convolutional layers and global average pooling to generate side branches with few parameters, and employs a locally-connected fusion module, which can learn adaptive weights for different side branches and form a better fused feature. Specifically, we introduce three key components of the proposed CFN, and discuss its differences from other deep models. Moreover, we propose fully convolutional fusion networks (FCFN) that are an extension of CFN for pixel-level classification applied to several tasks, such as semantic segmentation and edge detection. Our experiments demonstrate that CFN (and FCFN) can achieve promising performance by consistent improvements for both image-level and pixel-level classification tasks, compared to a plain CNN. We release our codes on https://github.com/yuLiu24/CFN . Also, we make a live demo ( goliath.liacs.nl ) using a CFN model trained on the ImageNet dataset.
What problem does this paper attempt to address?