Constructing Multiple Domain Taxonomy for Text Processing Tasks

Yihong Zhang,Yongrui Qin,Longkun Guo
DOI: https://doi.org/10.1007/978-3-319-98812-2_46
2018-01-01
Abstract:In recent years large volumes of short text data can be easily collected from platforms such as microblogs and product review sites. Very often the obtained short text data contains several domains, which poses many challenges in effective multi-domain text processing because it is challenging to distinguish among the multiple domains in the text data. The concept of multiple domain taxonomy (MDT) has shown promising performance in processing multi-domain text data. However, MDT has to be constructed manually, which requires much expert knowledge about the relevant domains and is time consuming. To address such issues, in this paper, we introduce a semi-automatic method to construct an MDT that only requires a small amount of manual input, in combination of an unsupervised method for ranking multi-domain concepts based on semantic relationships learned from unlabeled data. We show that the iteratively-constructed MDT using our semi-automatic method can achieve higher accuracy than existing methods in domain classification, where the accuracy can be improved by up to 11%.
What problem does this paper attempt to address?