Evaluation of Random Forest Based Methods for Controlling Confounding Factors

梁君雅,赵杨,段巍巍,何杰宇,魏永越,陈峰
DOI: https://doi.org/10.3969/j.issn.1002-3674.2022.06.010
2022-01-01
Abstract:Objective Confounding factor often affects the ability of random forest to screen out risk variable related to dependent variable in high-dimensional-omics research.Therefore, it is very important to control the confounding factor.Methods Through simulation experiments and case analysis, we compared the following four methods in controlling confounding factors in screening out the risk variable associated with research outcome: Random Forest(RF);Ranger; Ranger(weighted),where each confounding factor is given a weight of 100%;Residual & Ranger method, where dependent variable and independent variables corrected with the confounder effect are then taken as new dependent variable and independent variables into Ranger analysis.The study uses the proportion of risk factors in the ranking of variable importance measure as the evaluation index.Results By comparing the distribution of rank of the risk factor in the four methods, we find the latter two methods increase the proportion of the risk factor ranking top 1 in the variable importance measure based on extensive simulations.The GWAS data analysis shows that the ranking of the risk factor has been advanced after correcting for the confounders using our proposed two methods.Conclusion It is necessary to adjust the confounding factors to screen out risk factor related to research outcome.And Residual & Ranger method does better in confounder correcting than Ranger(weight).RF and Ranger almost can′t correct confounding effect.
What problem does this paper attempt to address?