Leveraging Transcriptomics-Based Approaches to Enhance Genomic Prediction: Integrating SNPs and gene-networks for Cotton Fibre Quality Improvement

Nima Khalilisamani,Zitong Li,Filomena A. Pettolino,Philippe Moncuquet,Antonio Reverter,Colleen P. MacMillan
DOI: https://doi.org/10.1101/2024.02.14.580398
2024-02-19
Abstract:The cotton genome contains ∼ 80K protein-coding genes, making precision breeding for complex traits a challenge. This study tested biology-informed approaches to improve genomic prediction (GP) accuracy for cotton fibre traits to help accelerate precision breeding of valuable traits. The study’s foundational approach was the use of RNA-seq data from key time points during fibre development, namely fibre cells undergoing primary, transition, and secondary wall development. The test approaches included using a range of summary statistics from RNA-seq analysis such as gene Differential Expression (DE). The three test approaches included DE genes overall, target pairwise DE lists informed by gene functional annotation, and finally, gene-network-clusters created based on Partial Correlation and Information Theory (PCIT) as the prior information in Bayesian GP models. The most promising improvements in GP accuracy were at the level of ∼ 5% increase by using PCIT-based gene-network clusters as the prior knowledge network neighbours of key genes, and for the traits of cotton fibre Elongation and Strength. These results indicate that there is scope to help improve precision breeding of target traits by incorporating biology-based inference into GP models, and points to specific approaches to achieve this.
Genomics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the accuracy of genomic prediction (GP) of cotton fiber quality traits, so as to accelerate the selection process of valuable traits in cotton breeding. Specifically, the research aims to improve the prediction accuracy of key traits such as cotton fiber length, strength and elongation rate by integrating transcriptomics data (such as differential expression gene analysis and gene network analysis) into the genomic prediction model. The paper explores different methods, including using differentially expressed genes (DE genes), target pairwise differential expression lists based on gene function annotation, and gene network clusters constructed based on partial correlation and information theory (PCIT) as prior information, to optimize Bayesian genomic prediction models (Bayesian GP models). The research shows that using PCIT - based gene network clusters as prior - knowledge neighbors of key genes can significantly improve the accuracy of genomic prediction, especially in terms of cotton fiber elongation rate and strength. These results indicate that incorporating biologically - based inferences into genomic prediction models is helpful for improving the precision breeding strategies of target traits.