A Vision-Based Approach for Deep Web Form Extraction.

Jiachen Pu,Jin Liu,Jin Wang
DOI: https://doi.org/10.1007/978-981-10-5041-1_111
2017-01-01
Abstract:The World Wide Web is a large source of information that contains data in either Surface Web or Deep Web. Compared with the data in the Surface Web, the Deep Web contains a greater amount of structured data with higher quality, but it is difficult to use directly. Studies in this field have revealed some methods for Deep Web Form Extraction, they may fall into the following categories which are HTML-based, vision-based, ontology-based, ML-based, NLP-based and so on. This paper try to combine the DOM tree and the convolutional neural network together and then find out the form in the Web page. This paper proposed a vision-based method VBF, which figures out the form from the Web page through the acquisition of the HTML code and screenshots of Web pages, establishment of the DOM tree and the calculation of the neural network and form recognition, matching, and generation.
What problem does this paper attempt to address?