A Semi-Structured Document Model for Text Mining.

Yang Jianwu,Chen Xiaoou
DOI: https://doi.org/10.1007/bf02948828
2002-01-01
Abstract:A semi-structured document has more structured information compared to an ordinary document, and the relation among semi-structured documents can be fully utilized. In order to take advantage of the structure and link information in a semi-structured document for better mining, a structured link vector model (SLVM) is presented in this paper, where a vector represents, a document, and vectors’ elements are determined by terms, document structure and neighboring documents. Text mining based on SLVM is described in the procedure of K-means for briefness and clarity: calculating document similarity and calculating cluster center. The clustering based on SLVM performs significantly better than that based on a conventional vector space model in the experiments, and its F value increases from 0.65–0.73 to 0.82–0.86.
What problem does this paper attempt to address?