Deep Learning Based Topic Identification and Categorization: Mining Diabetes-Related Topics on Chinese Health Websites.

Xinhuan Chen,Yong Zhang,Jennifer Xu,Chunxiao Xing,Hsinchun Chen
DOI: https://doi.org/10.1007/978-3-319-32025-0_30
2016-01-01
Abstract:As millions of people are diagnosed with diabetes every year, the demand for information about diabetes continues to increase. China is one of the countries with a large population of diabetes patients. Many Chinese health websites provide diabetes related news and articles. However, because most of the online articles are uncategorized or lack a clear topic and theme, users often cannot find their topics of interest effectively and efficiently. The problem of health topic identification and categorization on Chinese websites cannot be easily addressed by applying existing approaches and methods, which have been used for English documents, in a straightforward manner. To address this problem and meet users’ demand for diabetes related information needs, we propose a deep learning based framework to identify and categorize topics related to diabetes in online Chinese articles. Our experiments using datasets with over 19,000 online articles showed that the framework achieved a higher effectiveness and accuracy in categorizing diabetes related topics than most of the state-of-the-art benchmark approaches.
What problem does this paper attempt to address?