Extract Knowledge from Semi-Structured Websites for Search Task Simplification

Yingqin Gu,Jun Yan,Hongyan Liu,Jun He,Lei Ji,Ning Liu,Zheng Chen
DOI: https://doi.org/10.1145/2063576.2063847
2011-01-01
Abstract:Simplifying the key tasks of search engine users by directly retrieving to them structured knowledge according to their queries is attracting much attention from both industry and academia. A bottleneck of this challenging problem is how to extract the structured knowledge from the noisy and complex Web scale websites automatically. In this paper, we propose an unsupervised automatic wrapper induction algorithm, named as Scalable Knowledge Extractor from webSites (SKES). SKES induces the wrapper in a divide and conquer mode, i.e., it divides the general wrapper into several sub-wrappers to learn from the data independently. Moreover, through employing techniques such as tag path representation of Web pages, SKES is verified to be efficient and noise-tolerant by the experimental results. Furthermore, based on our automatically extracted knowledge, we also built a prototype to serve structured knowledge to end users for simplifying their key search tasks. Very positive feedbacks were received on the prototype.
What problem does this paper attempt to address?