Identification of genetic features associated with fine particulate matter (PM2.5) modulated DNA damage using improved random forest analysis.

Dongfang You,Na Qin,Mingzhi Zhang,Juncheng Dai,Mulong Du,Yongyue Wei,Ruyang Zhang,Zhibin Hu,David C Christiani,Yang Zhao,Feng Chen
DOI: https://doi.org/10.1016/j.gene.2020.144570
IF: 3.913
2020-01-01
Gene
Abstract:Recent studies have found multiple single nucleotide variants (SNVs) associated with DNA damage. However, previous association analysis may ignore the potential interaction effects between SNVs. Therefore, we used an improved random forest (RF) analysis to identify the SNVs related to personal DNA damage in exon-focused genome-wide association study (GWAS). A total of 301 subjects from three independent centers (Zhuhai, Wuhan, and Tianjin) were retained for analysis. An improved RF procedure was used to systematically screen key SNVs associated with DNA damage. Furthermore, we used genetic risk score (GRS) and mediation analysis to investigate the integrative effect and potential mechanism of these genetic variants on DNA damage. Besides, gene set enrichment analysis was conducted to identify the pathways enriched by key SNVs using the Data-driven Expression Prioritized Integration for Complex Traits (DEPICT). Finally, a set of 24 SNVs with the lowest mean square errors (MSE) were identified by improved RF analysis. Both weighted and unweighted GRSs were associated with increased DNA damage levels (Pweight < 0.001 and Punweight < 0.001). Gene set enrichment analysis indicated that these loci were significantly enriched in several biological features associated with DNA damage. These findings suggested the role of SNVs in modifying DNA damage levels. It may be convincing that this improved RF analysis can effectively identify SNVs associated with DNA damage levels.
What problem does this paper attempt to address?