Deep web self-adapting crawling method based on minimum searchable mode

Chaohui Wu,Qinghua Zheng,Xiao Chang,Liu Jun,Lu Jiang
2012-01-01
Abstract:The invention discloses a Deep Web self-adapting crawling method based on a minimum enquiry pattern. Aiming at the problem of the existing Deep Web crawling method that the crawling efficiency is low due to data isolated island, the invention firstly provides a conception of a minimum enquire pattern MEP and then provides an MEP generating algorithm and the self-adapting crawling method based on the MEP. The invention can cause an enquiry interface to be popularized to a minimum enquiry pattern set from a single textbox, a once enquiry is commonly determined by one MEP and keyword vector matched with the MEP, and a next enquiry with optimal expectation can be produced by a self-adapting way until enquiry stop conditions are satisfied. By using the minimum enquiry pattern, not only the form filling accuracy ratio is improved, but also the characteristics of all patterns can be fully utilized to select keywords so as to overcome the data isolated island problem better.
What problem does this paper attempt to address?