Understanding the Search Interfaces of the Deep Web Based on Domain Model
XiaoJie Yuan,HuiBin Zhang,ZongYun Yang,YanLong Wen
DOI: https://doi.org/10.1109/icis.2009.32
2009-01-01
Abstract:The Web has been rapidly deepened by many searchable databases online recently. Those databases can be accessed through form-based search interfaces that allow users to specify query conditions. For integrating Web databases, the very first challenge is to understand the search interface. Such a search interface can be considered as an interface schema with multiple attributes, however, the interface is created autonomously and its schema is not defined in HTML which makes the schema extraction of an interface a challenge task. In this paper, we propose a novel approach to automatic extraction of the logic attributes from search interfaces. First, we define a domain model for each deep Web domain as a global schema to guide the extraction process of the interface schema; second, we group the labels belonging to the same attribute of an interface, and produce a label tree of the interface; third, we find the related label of each element if it has one; last, we merge the results of the former two steps to complete the schema extraction. The last three steps are implemented based on the domain model. Our experiments show the promise of this approach-it achieves above 96.87% accuracy for extracting query attributes.