TCAMixer: A lightweight Mixer based on a novel triple concepts attention mechanism for NLP

Xiaoyan Liu,Huanling Tang,Jie Zhao,Quansheng Dou,Mingyu Lu
DOI: https://doi.org/10.1016/j.engappai.2023.106471
IF: 8
2023-05-29
Engineering Applications of Artificial Intelligence
Abstract:Large-scale model sizes and expensive computing costs cause the challenge of deploying and applying large pre-trained models. Hence, this paper presents a novel Triple Concepts Attention Mechanism and a lightweight TCAMixer model for edge devices to classify texts. Furthermore, the TCAMixer abstracts textual concepts in a human way, which is unmatched by other counterparts such as pNLP-Mixer (a projection-based MLP-Mixer model for Nature Language Processing) and HyperMixer (a hyper network using dynamic token-mixing layers). Experimental results on several public datasets demonstrate that the TCAMixer outperforms the counterparts by a significant margin, for example, achieving 3% higher accuracy with a smaller model size of 0.177M . Additionally, the TCAMixer achieves a performance of 85% to 98.7% compared to that of large pre-trained models but only occupies 1/3000 to 1/2000 of their size on most test datasets.
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary
What problem does this paper attempt to address?