An Approach Based on Domain Knowledge to Text Categorization

ZHU Jing-bo,CHEN Wen-liang
DOI: https://doi.org/10.3321/j.issn:1005-3026.2005.08.006
2005-01-01
Abstract:A knowledge-based text categorization method is proposed, taking domain features as textual features to improve text representation function and considering text categorization as aggregation computation procedure. A feature re-selection and re weighting technique is proposed for text indexing procedure. To learn feature aggregation functions from labeled training collection automatically, a learning method based on mutual information is employed. Comparative experiment results showed that the text categorization method based on domain knowledge works better than the conventional naive Bayes classifier based on bag-of-words model as a whole and that using domain knowledge will improve effectiveness of classifying similaror or antithetical topics.
What problem does this paper attempt to address?