Attributes extraction of Deep Web query interface based on DOM

Shi Long,Qiang Baohua,He Qian,Wu Chunming,Chen Chao
DOI: https://doi.org/10.3969/j.issn.1673-808X.2012.06.010
2012-01-01
Abstract:Query interface schema extraction is the precondition of Deep Web data integration.Generally query interface schema consists of a set of domain-related attributes,and one attribute is formed by a single element or a combination of multi-elements.The current researches on attribute extraction are mostly based on the single element fashion,and those multi-elements based are few.Aiming at the case of multi-elements attribute extraction,a DOM-based method for query interface schema extraction is proposed.This method parses query interface to become a DOM and extracts the form elements base on the corresponding DOM nodes.The method employs two-phase clustering algorithms to cluster the form elements,mines the combination relationship of them and combines elements to realize attributes extraction.This method has a favorable performance at both single-element and multi-elements attribute extraction.The experimental result shows that this method is effective.
What problem does this paper attempt to address?