A Comparative Study on Feature Window Selection in Text Filtering

Hu Quan,Xie Fang,Liu Xiaoguang
DOI: https://doi.org/10.1109/IFITA.2009.189
2009-01-01
Abstract:Text representation is a preliminary step to text filtering, while VSM is the most commonly used method in this field. However, the document feature set, which produced by VSM, usually has a very high dimensionality. As a result, the distribution of feature value tends to be highly skewed. In this paper some new mechanisms are presented to abate such problems. Using these mechanisms, document features are extracted from some smaller feature windows rather than a full text, such as sentences, graphs and blocks, and the correlative texts are finally evaluated by local similarity. They are gotten by the analysis of document’s linguistics structures in documents. As a result, it can give a remarkable effect on the precision of text filtering.
What problem does this paper attempt to address?