A Cluster-based Approach on Mining Text Preference

LIU Yuan-chao,WANG Xiao-long,LIU Bing-quan,ZHONG Bin-bin
DOI: https://doi.org/10.3969/j.issn.1001-3695.2005.12.006
2005-01-01
Abstract:It is one of the key technologies in NLP applications such as text information filtering and multi-document summarization to mine the hidden user text preference and concept vector from the training documents.To solve the problem of multi-topic problem in training documents,an approach which is based on cluster analysis has been introduced.The basic idea is to classify the training documents firstly,then analyze the commonness of the documents on the same topic. After feature weight modification and feature reduction,the concept vectors on different topic are formed.The experiment results show that the approach can represent user text preference more precisely,and not sensitive to relevance threshold.User preference profile can be mined by combing the approach with Rocchio algorithm.
What problem does this paper attempt to address?