Automatic Classification of Deep Web Sources Based on Search Interface Schemas

ZHAO Peng-peng,GAO Ling,CUI Zhi-ming
DOI: https://doi.org/10.3969/j.issn.1000-7180.2006.10.017
2006-01-01
Abstract:Web search engines work well for finding crawlable pages, but not for finding datasets hidden behind Web search forms. On this deep Web, many sources are structured by providing structured query interfaces and results. Organizing such structured sources into a domain hierarchy that users can browse to find these valuable resources and is one of the critical steps toward the large-scale integration of heterogeneous Deep Web sources. We propose a Automatic Classification of Structured Deep Web Sources based on the features available on the search interfaces. Our experimental results indicate that this approach can achieve good results.
What problem does this paper attempt to address?