Abstract:A taxonomy represents a hierarchically structured knowledge graph that forms the infrastructure for various downstream applications, including recommender systems, web search, and question answering. The exploration of automated induction from text corpora has yielded notable taxonomies such as CN-probase, CN-DBpedia, and Zhishi.schema. Despite these efforts, existing taxonomies still face two critical issues that result in sub-optimal hierarchical structures. On the one hand, commonly observed taxonomies exhibit a coarse-grained and "flat" structure, stemming from a noticeable lack of diversity in both nodes and edges. This limitation primarily originates from the biased and homogeneous data distribution. On the other hand, the semantic granularity among "siblings" within these taxonomies remains inconsistent, presenting a challenge in accurately and comprehensively identifying hierarchical relations. To address these issues, this study introduces a novel taxonomy induction framework composed of three meticulously designed components. Initially, we established a seed schema by leveraging statistical information from external data sources as distant supervision to append nodes and edges containing "generic semantics", thereby rectifying biased data distributions. Subsequently, a clustering algorithm is employed to group the nodes based on their similarities, followed by a refinement operation of the hierarchical relations among these nodes. Building on this seed schema, we propose a fine-grained contrastive learning method in the expansion module to strengthen the utilization of taxonomic structures, consequently boosting the precision of query-anchor matching. Finally, we meticulously scrutinized the hierarchical relations between each query and its siblings to ensure the integrity of the constructed taxonomy. Extensive experiments on real-world datasets validated the efficacy of our proposed framework for constructing well-structured taxonomies.

Which is better? Taxonomy induction with learning the optimal structure via contrastive learning

Taxonomy Induction from Chinese Encyclopedias by Combinatorial Optimization.

Chain-of-Layer: Iteratively Prompting Large Language Models for Taxonomy Induction from Limited Examples

Enquire One's Parent and Child Before Decision: Fully Exploit Hierarchical Structure for Self-Supervised Taxonomy Expansion

Taxonomy Induction and Taxonomy-based Recommendations for Online Courses

Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Copartitioning

Taxes Are All You Need: Integration of Taxonomical Hierarchy Relationships into the Contrastive Loss

An Integrated System for Building Enterprise Taxonomies

HCL4QC: Incorporating Hierarchical Category Structures into Contrastive Learning for E-commerce Query Classification

Improving Knowledge Graph Completion with Structure-Aware Supervised Contrastive Learning

A Unified Taxonomy-Guided Instruction Tuning Framework for Entity Set Expansion and Taxonomy Expansion

Learning What You Need from What You Did: Product Taxonomy Expansion with User Behaviors Supervision

TagRec: Automated Tagging of Questions with Hierarchical Learning Taxonomy

CodeTaxo: Enhancing Taxonomy Expansion with Limited Examples via Code Language Prompts

GANTEE: Generative Adversarial Network for Taxonomy Enterance Evaluation

GANTEE: Generative Adversatial Network for Taxonomy Entering Evaluation

CN-Probase: A Data-Driven Approach for Large-Scale Chinese Taxonomy Construction

Enhancing Recommendation with Automated Tag Taxonomy Construction in Hyperbolic Space

Hierarchical Topology Isomorphism Expertise Embedded Graph Contrastive Learning

Automatic Taxonomy Construction from Keywords.

TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision