COLAFormer: Communicating local–global features with linear computational complexity

Zhengwei Miao,Hui Luo,Meihui Li,Jianlin Zhang
DOI: https://doi.org/10.1016/j.patcog.2024.110870
IF: 8
2024-09-05
Pattern Recognition
Abstract:Local and sparse attention effectively reduce the high computational cost of global self-attention. However, they suffer from non-global dependency and coarse feature capturing, respectively. While some subsequent models employ effective interaction techniques for better classification performance, we observe that the computation and memory of these models overgrow as the resolution increases. Consequently, applying them to downstream tasks with large resolutions takes time and effort. In response to this concern, we propose an effective backbone network based on a novel attention mechanism called Concatenating glObal tokens in Local Attention (COLA) with a linear computational complexity. The implementation of COLA is straightforward, as it incorporates global information into the local attention in a concatenating manner. We introduce a learnable condensing feature (LCF) module to capture high-quality global information. LCF possesses the following properties: (1) performing a function similar to clustering, aggregating image patches into a smaller number of tokens based on similarity. (2) a constant number of aggregated tokens regardless of the image size, ensuring that it is a linear complexity operator. Based on COLA, we build COLAFormer, which achieves global dependency and fine-grained feature capturing with linear computational complexity and demonstrates impressive performance across various vision tasks. Specifically, our COLAFormer-S achieves 84.5% classification accuracy, surpassing other advanced models by 0.4% with similar or less resource consumption. Furthermore, our COLAFormer-S can achieve a better object detection performance while consuming only 1/4 of the resources compared to other state-of-the-art models. The code and models will be made publicly available.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?