TopicMine: User-Guided Topic Mining by Category-Oriented Embedding
Yu Meng,Jiaxin Huang,Zihan Wang,Chenyu Fan,Guangyuan Wang,Chao Zhang,Jingbo Shang,Lance Kaplan,Jiawei Han
2019-01-01
Abstract:With an ever-increasing volume of textual data coming from news reports, social media, literature articles, and medical records, it becomes a necessity to distill knowledge from text data by categories according to users’ interests. For example, given a general news corpus, one user may be interested in organizing articles by countries; whereas another may want to browse articles by themes. In either case, a user’s interest can be easily described by a set of category names. In this project, we develop a framework, TopicMine, which takes user-provided category names as guidance and mines category representative phrases to form coherent topics. Specifically, TopicMine first leverages a phrase mining tool to extract quality phrases from the text corpus, and then learns an embedding space that best separates the categories specified by the user. Finally, category representative phrases are retrieved by considering both topic relevance and semantic generality. The mined topics identified by category representative phrases facilitate effective and efficient understanding, organizing, searching, and summarizing of textual contents based on users’ needs.