A method for mining condition-specific co-expressed genes in Camellia sinensis based on k-means clustering

Xinghai Zheng,Peng Ken Lim,Marek Mutwil,Yuefei Wang
DOI: https://doi.org/10.1186/s12870-024-05086-5
IF: 5.26
2024-05-10
BMC Plant Biology
Abstract:As one of the world's most important beverage crops, tea plants ( Camellia sinensis ) are renowned for their unique flavors and numerous beneficial secondary metabolites, attracting researchers to investigate the formation of tea quality. With the increasing availability of transcriptome data on tea plants in public databases, conducting large-scale co-expression analyses has become feasible to meet the demand for functional characterization of tea plant genes. However, as the multidimensional noise increases, larger-scale co-expression analyses are not always effective. Analyzing a subset of samples generated by effectively downsampling and reorganizing the global sample set often leads to more accurate results in co-expression analysis. Meanwhile, global-based co-expression analyses are more likely to overlook condition-specific gene interactions, which may be more important and worthy of exploration and research.
plant sciences
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to improve the accuracy of gene co-expression network analysis in tea plants (Camellia sinensis) by enhancing the methods for co-expression analysis of large-scale transcriptome data. Specifically, the paper addresses the following issues: 1. **Sample Noise Reduction and Classification**: - Using the k-means clustering method to classify tea plant samples to reduce the impact of multidimensional noise and improve the accuracy of co-expression analysis. 2. **Condition-Specific Gene Interactions**: - Exploring gene interaction relationships formed under specific conditions, which may be overlooked in global analysis but are more important for biological processes under specific conditions. 3. **Introduction of Correlation Difference Value (CDV)**: - Proposing a new metric—Correlation Difference Value (CDV)—to evaluate the specificity of genes under specific conditions. By introducing CDV, genes highly expressed under specific conditions can be better identified. 4. **Identification of Key Genes**: - Using the above methods, the paper identifies a series of transcription factor-encoding genes highly expressed under continuous low-temperature treatment and determines gene pairs involved in the tea plant's antioxidant defense system. 5. **Functional Enrichment Analysis**: - Conducting functional enrichment analysis on co-expression modules to further validate the functional annotation of condition-specific genes. ### Conclusion The paper improves the accuracy of co-expression analysis through sample noise reduction and reorganization. Condition-specific modules can more accurately capture gene interaction relationships under specific conditions. With the introduction of CDV, the specificity of genes under specific conditions can be evaluated, providing more information for selecting key genes. This study emphasizes the importance of considering condition specificity in co-expression analysis and provides new insights into understanding the regulatory mechanisms of tea plants under continuous low-temperature conditions.