Analyzing Feature Importance for Metabolomics Using Genetic Programming.

Ting Hu,Karoliina Oksanen,Weidong Zhang,Edward Randell,Andrew Furey,Guangju Zhai
DOI: https://doi.org/10.1007/978-3-319-77553-1_5
2018-01-01
Abstract:The emerging and fast-developing field of metabolomics examines the abundance of small-molecule metabolites in body fluids to study the cellular processes related to how the human body responds to genetic and environmental perturbations. Considering the complexity of metabolism, metabolites and their represented cellular processes can correlate and synergistically contribute to a phenotypic status. Genetic programming (GP) provides advanced analytical instruments for the investigation of multifactorial causes of metabolic diseases. In this article, we analyzed a population-based metabolomics dataset on osteoarthritis (OA) and developed a Linear GP (LGP) algorithm to search classification models that can best predict the disease outcome, as well as to identify the most important metabolic markers associated with the disease. The LGP algorithm was able to evolve prediction models with high accuracies especially with a more focused search using a reduced feature set that only includes potentially relevant metabolites. We also identified a set of key metabolic markers that may improve our understanding of the biochemistry and pathogenesis of the disease.
What problem does this paper attempt to address?