AMFF-net: Adaptive Multi-Modal Feature Fusion Network for Image Classification
Wei Liu,Xiaobo Lu,Yun Wei
DOI: https://doi.org/10.1007/s11042-023-16217-9
IF: 2.577
2024-01-01
Multimedia Tools and Applications
Abstract:Convolutional neural networks(CNNs) have been applied to different computer vision tasks such as image classification and recognition, object detection, and segmentation due to the excellent capability of feature extraction and strong generalization ability in recent years. However, CNNs mainly represent the semantic information of images by aggregating local features. It is proved that some global features, such as histograms of oriented gradients, color information, and local binary pattern features, are useful for image recognition. Nonetheless, some researchers simply concatenate these features together, overlooking the differences between features, which leads to the inability to obtain desired performance or even worse results. To better integrate multi-modal features, in this paper a novel feature fusion module is proposed, named AMFF Network, which can adaptively fuse CNNs’ local-global features and traditional global features. That’s to say, the high-level semantic characteristic of objects and the low-level detailed information and appearance features can be combined dynamically by this network. It is convenient to embed the network in various architectures and can generalize effectively in various datasets. Furtherly, we show that the AMFF module brings obvious performance improvements for current state-of-the-art methods at some additional calculation cost. Experiments performed on multiple benchmark datasets, such as Fashion-MNIST, CIFAR10, CIFAR100, Tiny-Imagenet-200, and Market1501, demonstrate that the proposed AMFF-Net module can bring significant promotion in different datasets for image classification.