Abstract:Feature selection in Knowledge Graphs (KGs) are increasingly utilized in diverse domains, including biomedical research, Natural Language Processing (NLP), and personalized recommendation systems. This paper delves into the methodologies for feature selection within KGs, emphasizing their roles in enhancing machine learning (ML) model efficacy, hypothesis generation, and interpretability. Through this comprehensive review, we aim to catalyze further innovation in feature selection for KGs, paving the way for more insightful, efficient, and interpretable analytical models across various domains. Our exploration reveals the critical importance of scalability, accuracy, and interpretability in feature selection techniques, advocating for the integration of domain knowledge to refine the selection process. We highlight the burgeoning potential of multi-objective optimization and interdisciplinary collaboration in advancing KG feature selection, underscoring the transformative impact of such methodologies on precision medicine, among other fields. The paper concludes by charting future directions, including the development of scalable, dynamic feature selection algorithms and the integration of explainable AI principles to foster transparency and trust in KG-driven models.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the feature selection problem in Knowledge Graphs (KGs). Specifically, the paper explores how to use graph data structures and knowledge graphs for effective feature selection in different fields, such as biomedical research, natural language processing, and personalized recommendation systems. The paper emphasizes the role of these methods in improving the effectiveness of machine - learning models, hypothesis generation, and interpretability. Through a comprehensive review, the authors aim to promote further innovation in the field of feature selection in knowledge graphs and provide more in - depth, efficient, and interpretable methods for cross - domain analysis models. The paper points out that in the era of big data, as the size and complexity of data sets increase, feature selection becomes particularly important. The goal of feature selection is to select the most relevant subset from a large number of input variables to deal with the "curse of dimensionality", reduce overfitting, and improve computational efficiency. In addition, a streamlined feature set helps to improve the interpretability of the model in key areas, such as healthcare and finance, and enhances the model's generalization ability to new data. Therefore, the paper pays special attention to how to select nodes or entities in knowledge graphs for hypothesis generation and further research, for example, inferring a new subset of genes related to a specific disease from a graph containing genes and diseases. The paper also discusses the challenges faced by the combination of feature selection and knowledge graphs, including scalability, the integrity of knowledge graphs, and the need to adapt to different fields. To address these challenges, the authors propose a variety of innovative methods, such as embedding - based feature selection and the application of graph neural networks. These methods can effectively manage and analyze the high - dimensional space inherent in knowledge graphs, thereby promoting a more detailed and comprehensive analysis of the data.

A review of feature selection strategies utilizing graph data structures and knowledge graphs

A Review of Biomedical Datasets Relating to Drug Discovery: A Knowledge Graph Perspective

Knowledge Graph Embeddings in the Biomedical Domain: Are They Useful? A Look at Link Prediction, Rule Learning, and Downstream Polypharmacy Tasks

Understanding the Performance of Knowledge Graph Embeddings in Drug Discovery

A Comprehensive Survey of Graph Neural Networks for Knowledge Graphs

Applications of knowledge graph in medical and financial fields: Data integration and intelligent decision-making from an interdisciplinary perspective

Knowledge graphs in psychiatric research: Potential applications and future perspectives

Knowledge Graph Applications in Medical Imaging Analysis: A Scoping Review

Knowledge Graphs for drug repurposing: a review of databases and methods

Optimizing Feature Selection with Genetic Algorithms: A Review of Methods and Applications

A feature-enhanced knowledge graph neural network for machine learning method recommendation

Interrelated feature selection from health surveys using domain knowledge graph

A Review on Knowledge Graphs for Healthcare: Resources, Applications, and Promises

Knowledge-augmented Graph Machine Learning for Drug Discovery: A Survey

Knowledge Graphs and Knowledge Networks: The Story in Brief

A Review of Relational Machine Learning for Knowledge Graphs

The application of health recommender systems based on knowledge graph: A scoping review

Navigating Healthcare Insights: A Birds Eye View of Explainability with Knowledge Graphs

Empowering recommender systems using automatically generated Knowledge Graphs and Reinforcement Learning

Optimization of Retrieval Algorithms on Large Scale Knowledge Graphs