Efficient large-scale biomedical ontology matching with anchor-based biomedical ontology partitioning and compact geometric semantic genetic programming

Xingsi Xue,Donglei Sun,Achyut Shankar,Wattana Viriyasitavat,Patrick Siarry
DOI: https://doi.org/10.1016/j.jii.2024.100637
IF: 11.718
2024-05-27
Journal of Industrial Information Integration
Abstract:Biomedical ontology offers a structured framework to model the biomedical knowledge in a machine-readable format. However, the heterogeneity inherent in biomedical ontologies hinders their communication. Biomedical Ontology Matching (BOM) can address this issue by identifying equivalent concepts in biomedical ontologies. Recently, Evolutionary Algorithms (EAs) based matching techniques have exhibited their effectiveness in finding high-quality matching results. However, due to the vast number of entities, and intricate relationships between entities, it is difficult for traditional EAs to efficiently solve the BOM problem. To tackle this challenge, this paper proposes an efficient BOM method to automatically match large-scale biomedical ontologies. First, a novel anchor-based biomedical ontology partitioning method is developed to transform the large-scale BOM problem into several small-scale matching tasks, reducing the search space of the matching phase. Second, a new Compact Geometric Semantic Genetic Programming (CGSGP) is proposed to efficiently construct high-level Similarity Feature for BOM, which can significantly reduce the computational complexity. Lastly, a new fitness function composed of the approximated evaluation metric and the Dominance Improvement Ratio (DIR) is introduced, which can overcome the solution's bias improvement and enable the simultaneous matching of multiple pairs of sub-ontologies without requiring the standard alignment. The experiment verifies our approach's performance on the Ontology Alignment Evaluation Initiative (OAEI)'s Anatomy, Large Biomed and Disease and Phenotype datasets. The experimental results show that our method can efficiently determine high-quality BOM results across different test cases, whose performance significantly outperforms the state-of-the-art BOM techniques.
computer science, interdisciplinary applications,engineering, industrial
What problem does this paper attempt to address?