Bootstrapping Information Extraction Via Conceptualization
Jiaqing Liang,Suo Feng,Chenhao Xie,Yanghua Xiao,Jindong Chen,Seung-Won Hwang
DOI: https://doi.org/10.1109/icde51399.2021.00012
2021-01-01
Abstract:Bootstrapping enables us to use existing knowledge to find patterns and extract new knowledge from free texts, from which more patterns can be found. Due to its minimally supervised, domain-independent, and language-independent nature, it has been widely adopted in real-world applications. However, as iterations go on, semantic drift may happen. The extraction may shift from the target class to other classes and result in errors, which propagate in the succeeding iterations and hurt the performance significantly. Existing solutions simply throw away bad patterns, sacrificing recall to ensure high precision. However, we argue that most of these patterns and instances can be kept as long as being applied selectively, guided by prior knowledge. In this paper, we propose a pattern-based extraction framework with three distinguished features: (1) it uses conceptual taxonomies to guide the extraction to reduce semantic drift; (2) it uses the knowledge of existing triples to improve the precision; (3) it integrates all patterns to form a generalized pattern set with quantified confidence measurement. The proposed solution is applied on enriching two real-world knowledge bases and achieves higher precision and recall compared to existing solutions.