Structured processing method of distributed network information

Chang Pengfei,Wu Sai,Chen Ke,Shou Lidan,Chen Gang
2015-01-01
Abstract:The invention discloses a structured processing method of distributed network information. The method comprises the following steps: configuring a network information acqusition task, and saving interesting webpages of a user in category to serve as target webpages; acquiring the network information, cooperatively acquiring the webpages through multiple map/reduce processes, performing structured processing and saving in an HDFS (Hadoop Distributed File System) file system; performing structured clustering on the webpages after the structured processing by using a tree edit distance mode; performing structured extraction on the clustered webpage information, and saving in a database. A distributed architecture is adopted, a huge data volume of network data can be processed by using the calculation and storage capacity of a cheap computer cluster; the webpages are effectively classified; the network information is extracted and saved by using the structured mode, and further analytical processing of the network information is facilitated.
What problem does this paper attempt to address?