The Implementation of Automatic Classification System Based on New Subject Extraction & Noise Reduction Method

CAI Wei,WANG Ying-lin,YIN Zhong-hang
IF: 8.1
2009-01-01
Information Sciences
Abstract:Based on research&develop text automatic classification system a lot of years,authors pre-sent a new classifier scheme using hybrid algorithms. According to Internet news's character,we pre-sent a new method to extract subject from Internet news by string match without thesaurus,as a sup-plement for "thesaurus+match" mode; noise reduction is one of hardest problem to improve classifica-tion accuracy,we find multi-category noise while we develop this system,and present an algorithm to remove multi-category noise based on statistics of frequency. We develop an Interuet news automatic classification system using all methods mentioned above,and tested it using 10,0000 piece of Internet news corpus provided by ChinaInforBank. Classification result is well,close to practical level.
What problem does this paper attempt to address?