A method of Web news extraction based on decision tree

HU Jun-kun,WANG Hao,YANG Jing
DOI: https://doi.org/10.3969/j.issn.1003-5060.2009.06.002
2009-01-01
Abstract:This paper proposes a general method of extracting Web news from Chinese news websites.By means of characteristic vector extraction and the decision tree learning algorithm,the decision tree model of the textnode is established and sorted according to the website it comes from,and then a model base is set up.When the url of a textnode is input,the website is searched according to the url the web page comes from,then the right model is selected.If the proper model can not be found,the general one can be chosen.The experiments prove that this kind of method can attain a good result.
What problem does this paper attempt to address?