Effective Schema Extraction of Query Interfaces on the Deep Web

Bao-hua Qiang,Jian-qing Xi,Ling Chen
DOI: https://doi.org/10.1109/FSKD.2008.135
2008-01-01
Abstract:The Deep Web is becoming a very important information resource. Unlike the traditional Web information retrieval, the contents on the Deep Web are only accessible through source query interfaces. However, for any domain of interest, there may be so many query interfaces that users need to access them in order to get the desired information, which is time-consuming and requires to build an integrated query interface over the sources. The first important task towards this goal is schema extraction of source query interface. In this paper, we will present a novel pre-clustering algorithm with proper grouping patterns to obtain partial clustering of attributes. Our approach can avoid obtaining the incorrect subsets when grouping attributes. The experimental results showed our approach is highly effective on schema extraction of source query interfaces on the Deep Web.
What problem does this paper attempt to address?