Design of analysis system for documents based on web crawler

Jingtao Shang,Jianjun Lin,Yan Qin,Bo Li,Mengmeng Wu
DOI: https://doi.org/10.1109/CompComm.2016.7924710
2016-01-01
Abstract:This paper studies the technologies of information extraction and data mining. By using web crawlers to analyze and process the specified block of text, a system for document analysis is completed. For specific needs, this system is able to simulate the working process of the search engines. By intercepting the data flow, the automatic extraction and filtering of the document are achieved, so it effectively improves the automation of document information extraction.
What problem does this paper attempt to address?