Research on Keyword Extraction Algorithm for Chinese Text Based on Document Topic Structure and Semantics

Kunhui Lin,Chuchu Gao,Xiaoli Wang,Ming Qiu
DOI: https://doi.org/10.1109/iccse.2018.8468861
2018-01-01
Abstract:Keywords can summarize the content of articles and reflect the topic of articles, which helps people to find resources. However, most of the current text resources do not provide keywords. Manual tagging keywords, with high accuracy, but often with strong subjectivity, takes more time to read and understand the text, which obviously can't meet the rapid growth of information resources today. Keyword extraction technology, establishing a unified standard, with the help of the computer's rapid processing power, automatically extracting keywords, can greatly reduce the manpower, time consumption and the impact of subjectivity. In this paper, we propose an improved algorithm for extracting more effective keywords. We first find the optimal paragraphing in the continuous text segmentation, and construct the topic hierarchy of the document based on the vector space model. Then we develop an algorithm based on the topic hierarchy of the document to extract most significant keywords. We add the semantic similarity between Chinese words to further improve the algorithm, and combine the statistical methods with semantics to improve the effect of keyword extraction.
What problem does this paper attempt to address?