AC-MMOE: A Multi-gate Mixture-of-experts Model Based on Attention and Convolution

Keyao Li,Jungang Xu
DOI: https://doi.org/10.1016/j.procs.2023.08.156
2023-09-03
Procedia Computer Science
Abstract:Multi-task learning (MTL), an important branch of machine learning, has been successfully applied to many fields, and its effectiveness in practice has been proved. However, at present, the soft parameter sharing model represented by multi-gate mixtureof-experts (MMOE) still has some disadvantages, including negative transfer, seesaw phenomenon, and inadequate utilization of shared information. Although the existing research has improved these issues, they also bring some new problems, such as high model complexity and difficulty in hyperparameter tuning. To address these issues, we propose a multi-gate mixture-of-experts model base on attention and convolution (AC-MMOE), which incorporates a multi-layer perception based attention module and a column convolution module. AC-MMOE applies attention to achieve feature extraction and convolution to integrate the output of shared substructures, which improves the feature extraction ability and information fusion ability of the model without significantly increasing the model training cost. We validate the performance of AC-MMOE on several MTL datasets, the experimental results show that our model achieves better results than other baselines on various datasets with different task correlations.
What problem does this paper attempt to address?