A domain-knowledge based reconstruction framework for out-of-domain news title classification

Shi Yuan,Ningning Liu,Bo Sun,Chen Zhao
DOI: https://doi.org/10.1016/j.eswa.2023.121483
IF: 8.5
2023-09-21
Expert Systems with Applications
Abstract:News title classification is a widely-used task for people to organize mass news. But traditional classification methods usually depend on a pre-defined label set, while many news articles may belong to unseen labels instead of pre-defined labels. To address this challenge, this paper develops a domain-knowledge based reconstruction framework to classify both out-of-domain (titles with unseen labels) and in-domain (titles with pre-defined labels) titles. Specifically, out-of-domain titles are identified through the reconstructed difference from autoencoder, while in-domain titles are classified to the pre-defined label with the highest cosine similarity between title and each category explanation text. Those category explanation texts are regarded as domain knowledge for in-domain aspect classification, aiming to make reconstruction framework more stable and meanwhile provide extra information for short titles. Furthermore, our method adopts BERT model to generate pre-trained embeddings for input texts. We conducted experiments on BRI (Belt and Road Initiative) news dataset and demonstrated that our method has achieved F1-score of 0.735 and 0.838 for out-of-domain title identification and in-domain aspect classification respectively. Moreover, we used titles from four countries to indicate our method's application value in cross-border business analysis.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?