Multi-label classification with XGBoost for metabolic pathway prediction

Hyunwhan Joe,Hong-Gee Kim
DOI: https://doi.org/10.1186/s12859-024-05666-0
IF: 3.307
2024-02-03
BMC Bioinformatics
Abstract:Metabolic pathway prediction is one possible approach to address the problem in system biology of reconstructing an organism's metabolic network from its genome sequence. Recently there have been developments in machine learning-based pathway prediction methods that conclude that machine learning-based approaches are similar in performance to the most used method, PathoLogic which is a rule-based method. One issue is that previous studies evaluated PathoLogic without taxonomic pruning which decreases its performance.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper primarily addresses the following issues: 1. **Evaluating the PathoLogic Algorithm**: - Updates the evaluation results from previous studies, demonstrating that PathoLogic performs better than previous machine learning methods when using taxonomic pruning, and points out the need for further improvements to compete with it. - Emphasizes that previous studies did not use taxonomic pruning to evaluate PathoLogic, which may have led to an underestimation of its performance. 2. **Proposing a New Multi-Label Classification Method**: - Proposes a metabolic pathway prediction method based on XGBoost (mlXGPR), which is based on a multi-label classification framework. - Uses classifier chains to improve model performance by determining the order of the chains to leverage the correlations between labels. 3. **Validating Model Performance**: - Evaluates mlXGPR on benchmark datasets of single organisms and multiple organisms, showing that mlXGPR outperforms other methods, including PathoLogic with taxonomic pruning, in terms of Hamming loss, accuracy, and F1 score on single organism benchmark datasets. Through these contributions, the paper demonstrates significant improvements in metabolic pathway prediction using machine learning methods, and in some cases, even surpasses PathoLogic with taxonomic pruning.