Which is better? Taxonomy induction with learning the optimal structure via contrastive learning

Yuan Meng,Songlin Zhai,Zhihua Chai,Yuxin Zhang,Tianxing Wu,Guilin Qi,Wei Song
DOI: https://doi.org/10.1016/j.knosys.2024.112405
IF: 8.139
2024-09-05
Knowledge-Based Systems
Abstract:A taxonomy represents a hierarchically structured knowledge graph that forms the infrastructure for various downstream applications, including recommender systems, web search, and question answering. The exploration of automated induction from text corpora has yielded notable taxonomies such as CN-probase, CN-DBpedia, and Zhishi.schema. Despite these efforts, existing taxonomies still face two critical issues that result in sub-optimal hierarchical structures. On the one hand, commonly observed taxonomies exhibit a coarse-grained and "flat" structure, stemming from a noticeable lack of diversity in both nodes and edges. This limitation primarily originates from the biased and homogeneous data distribution. On the other hand, the semantic granularity among "siblings" within these taxonomies remains inconsistent, presenting a challenge in accurately and comprehensively identifying hierarchical relations. To address these issues, this study introduces a novel taxonomy induction framework composed of three meticulously designed components. Initially, we established a seed schema by leveraging statistical information from external data sources as distant supervision to append nodes and edges containing "generic semantics", thereby rectifying biased data distributions. Subsequently, a clustering algorithm is employed to group the nodes based on their similarities, followed by a refinement operation of the hierarchical relations among these nodes. Building on this seed schema, we propose a fine-grained contrastive learning method in the expansion module to strengthen the utilization of taxonomic structures, consequently boosting the precision of query-anchor matching. Finally, we meticulously scrutinized the hierarchical relations between each query and its siblings to ensure the integrity of the constructed taxonomy. Extensive experiments on real-world datasets validated the efficacy of our proposed framework for constructing well-structured taxonomies.
computer science, artificial intelligence
What problem does this paper attempt to address?