Team BJUT-BJFU at BioCreative VII LitCovid Track:A Deep Learning based Method for Multi- label Topic Classification in COVID-19 Literature

Shuo Xu,Yuefu Zhang,Xin An
2021-01-01
Abstract:Background: The rapid growth of COVID-19-related articles poses a significant challenge for manual curation and interpretation. It is very crucial to be able to increase the accuracy of automated topic prediction in COVID-19-related literature. The LitCovid track in BioCreative VII was specially designed to optimize the multi-label annotation for COVID-19 literature. Materials and Methods: The following fields are utilized to predict the multi-label classification of COVID-19 literature: title, abstract, keyword, journal name, and the resulting labels. To benefit from powerful deep learning models, four models are involved in our submissions: FastText, TextRCNN, TextCNN, and Transformer. We combine the training and development sets into our training set, which is further grouped into ten disjoint subsets with nearly equal size and similar label distribution. Then, the resulting parameters in each model above are tuned with 10-fold crossvalidation procedure. Finally, to ensure that the posterior probability of each instance's predicted label follows the same format of a provided sample, the posterior probability is adjusted properly. Results: The best performance of our five runs is 85.56%, 78.47% and 87.01% in terms of label-based micro-average F1, labelbased macro-average F1, and instance-based F1, respectively. This outperforms the baseline method ML-Net. Conclusions: A deep learning based approach for multi-label classification in COVID-19 literature is proposed in this study. Our approach mainly consists of three components: preprocessing (dataset division, formatting), modeling (10-fold cross validation, prediction), and post-processing (threshold adaption and probability conversion). According to the results disclosed by the organizer, our method is superior to the baseline method. This indicates that our method is valuable in dealing with multi-label classification problem. Keywords—Multi-Label Learning; Deep Learning; Cross Validation; Label Relationship; Stratification Method
What problem does this paper attempt to address?