Exploring the Deep Web: Associativity Search over Schematic Metadata
Govind Kabra,Zhen Zhang,Kevin Chen-Chuan Chang,Lipyeow Lim,Min Wang,Yuan‐Chi Chang
2006-01-01
Abstract:The Web has been rapidly deepened with the prevalence of databases online. As sources proliferate, while there are often useful, alternative, and related sources for our needs, we are lacking an effective facility to explore this ”deep Web.” For “ad-hoc users” and “system integrators” alike, to enable access and integration to the multitude of sources, we often must answer semantic association questions– How sources relate to each other? What ”vocabularies” do they speak? Such semantic associativity is often revealed holistically through cooccurrence analysis of “schematic metadata,” which describes the nature of data at sources. We observe two interesting phenomena through the syntactic associativity of sources and their schematic metadata: The first phenomenon, occurrence localities, suggests syntactic associativity as a useful notion for discovering semantic associativity, and the second, fuzzy boundaries, suggests a query-driven rank-based mechanism as its realization. We thus propose to build an associativity search facility for systematic exploration of deep Web sources. In its realization, we combine occurrence analysis and link analysis by abstracting occurrence of metadata in sources as links in a graph, which effectively transforms associativity of entities into connectivity of nodes. To quantify the associativity, we propose a wave propagation model; to compute the associativity efficiently, we develop spatial and temporal optimization strategies. We validate the usefulness and efficiency with a real-world dataset of 30,000 sources. The experiments show that syntactic associativity is not only useful for semantic discovery, but also practical as an online search mechanism.