Random forest machine-learning algorithm classifies white- and brown-rot fungi according to the number of Carbohydrate-Active enZyme genes

Natsuki Hasegawa,Masashi Sugiyama,Kiyohiko Igarashi
DOI: https://doi.org/10.1101/2024.03.15.585254
2024-03-17
Abstract:Wood-rotting fungi play an important role in the global carbon cycle because they are only known organisms that digest wood, the largest carbon stock in nature. In the present study, we used linear discriminant analysis and random forest (RF) machine learning algorithms to predict white- or brown-rot decay modes from the numbers of genes encoding Carbohydrate-Active enZymes (CAZymes) with over 98% accuracy. Unlike other algorithms, RF identified specific genes involved in cellulose and lignin degradation, including auxiliary activities (AA) family 9 lytic polysaccharide monooxygenases, glycoside hydrolase family 7 cellobiohydrolases, and AA family 2 peroxidases, as critical factors. This study sheds light on the complex interplay between genetic information and decay modes and underscores the potential of RF for comparative genomics studies of wood-rotting fungi.
Bioinformatics
What problem does this paper attempt to address?