A Comparison Study: Web Pages Categorization with Bayesian Classifiers

Zengmei Fu,Chuanliang Chen,Yunchao Gong,Rongfang Bie
DOI: https://doi.org/10.1109/HPCC.2008.80
2008-01-01
Abstract:In the recent few years, web mining has become a hotspot of data mining with the development of Internet. Web pages classification is one of the essential techniques for web mining since classifying web pages of an interesting class is often the first step of mining the web. The high dimensional text vocabulary space is one of the main challenges of web pages. In this paper, we study the capabilities of bayesian classifiers for web pages categorization. Several feature selection techniques, such as Chi Squared, Information Gain and Gain Ratio are used for selecting relevant words in web pages. Results on benchmark dataset show that the performances of Aggregating One-Dependence Estimators (AODE) and Hidden Naive Bayes (HNB) are both more competitive than other traditional methods.
What problem does this paper attempt to address?