Method of Collecting and Analyzing News Pages on Internet

WU Ding-ming,ZHAO Dong-yan
DOI: https://doi.org/10.3321/j.issn:1002-8331.2007.36.053
2007-01-01
Computer Engineering and Applications Journal
Abstract:This paper gives a method of collecting web pages of news.That is downloading the entry web page of a specified website,distinguishing the characters of the pages to which the entry web page links,filtrating irrelevant contents and extracting all the correlative hyperlinks of news on the entry web page.Considering the style of titles,the pictures and date of news,the method analyzes multi-levels hyperlinks and gives the ranking of those hyperlinks using NewsPageRank algorithm.The result of testing shows that the method adapts to the majority of websites of news and has a good practicality.
What problem does this paper attempt to address?