Improved N-gram Model Based on Ontology for Web Page Classification

LIU Jin-hong,LU Yu-liang
DOI: https://doi.org/10.3969/j.issn.1000-7024.2007.13.059
2007-01-01
Abstract:Text classification play more and more important role in the web information retrieval.Instead of using traditional classification models,domain ontology is applied in N-Gram models to classify Chinese text information.Domain concept and valid word chainis used as index items and related weights computation as well as smoothing methods.The experimental results show that it greatly reduced the size of feature dimensions and outperform traditional N-Gram classification model.
What problem does this paper attempt to address?