Text Classification by Contrastive Learning and Cross-lingual Data Augmentation for Alzheimer’s Disease Detection

Zhiqiang Guo,Zhaoci Liu,Zhenhua Ling,Shijin Wang,Lingjing Jin,Yunxia Li
DOI: https://doi.org/10.18653/v1/2020.coling-main.542
2020-01-01
Abstract:Data scarcity is always a constraint on analyzing speech transcriptions for automatic Alzheimer’s disease (AD) detection, especially when the subjects are non-English speakers. To deal with this issue, this paper first proposes a contrastive learning method to obtain effective representations for text classification based on monolingual embeddings of BERT. Furthermore, a cross-lingual data augmentation method is designed by building autoencoders to learn the text representations shared by both languages. Experiments on a Mandarin AD corpus show that the contrastive learning method can achieve better detection accuracy than conventional CNN-based and BERTbased methods. Our cross-lingual data augmentation method also outperforms other compared methods when using another English AD corpus for augmentation. Finally, a best detection accuracy of 81.6% is obtained by our proposed methods on the Mandarin AD corpus.
What problem does this paper attempt to address?