Semi-supervised text classification with information retrieval techniques

Jia Zhiyang,Gao Wei,Wang Yonggang
2012-01-01
Abstract:It supposes that the search results bear some relation both to the Key words:of the query and a certain text category.As such,queries are constructed according to the feature words extracted from the initial sample set,then queries are send to the search engine and web pages are downloaded from the search results which response from the search engine.Downloaded web pages are processed by eliminating of duplicated content,noise reduction and extraction of text content.These samples are expanded into the sample set after the category of the samples is predicted.Finally a Naive Bayes text classifier is retrained by the enlarged sample set.The classification effect of the classifier is also experimented.Experimental results show that the precision of semi-supervised text classification method with information retrieval techniques is significantly better than the classifier constructed by small sample set.
What problem does this paper attempt to address?