Random Forest for Genomic Prediction

Osval Antonio Montesinos López,Abelardo Montesinos López,Jose Crossa
DOI: https://doi.org/10.1007/978-3-030-89010-0_15
2022-01-01
Abstract:Abstract We give a detailed description of random forest and exemplify its use with data from plant breeding and genomic selection. The motivations for using random forest in genomic-enabled prediction are explained. Then we describe the process of building decision trees, which are a key component for building random forest models. We give (1) the random forest algorithm, (2) the main hyperparameters that need to be tuned, and (3) different splitting rules that are key for implementing random forest models for continuous, binary, categorical, and count response variables. In addition, many examples are provided for training random forest models with different types of response variables with plant breeding data. The random forest algorithm for multivariate outcomes is provided and its most popular splitting rules are also explained. In this case, some examples are provided for illustrating its implementation even with mixed outcomes (continuous, binary, and categorical). Final comments about the pros and cons of random forest are provided.
What problem does this paper attempt to address?