Application of machine learning for identification of heterotic groups in sunflower through combined approach of phenotyping, genotyping and protein profiling

Danish Ibrar,Shahbaz Khan,Mudassar Raza,Muhammad Nawaz,Zuhair Hasnain,Muhammad Kashif,Afroz Rais,Safia Gul,Rafiq Ahmad,Abdel-Rhman Z. Gaafar
DOI: https://doi.org/10.1038/s41598-024-58049-z
IF: 4.6
2024-03-29
Scientific Reports
Abstract:Application of machine learning in plant breeding is a recent concept, that has to be optimized for precise utilization in the breeding program of high yielding crop plants. Identification and efficient utilization of heterotic grouping pattern aided with machine learning approaches is of utmost importance in hybrid cultivar breeding as it can save time and resources required to breed a new plant hybrid/variety. In the present study, 109 genotypes of sunflower were investigated at morphological, biochemical (SDS-PAGE) and molecular levels (through micro-satellites (SSR) markers) for heterotic grouping. All the three datasets were combined, scaled, and subjected to unsupervised machine learning algorithms, i.e., Hierarchical clustering, K-means clustering and hybrid clustering algorithm (hierarchical + K-means) for assessment of efficiency and resolution power of these algorithms in practical plant breeding for heterotic grouping identification. Following the application of machine learning unsupervised clustering approach, two major groups were identified in the studied sunflower germplasm, and further classification revealed six smaller classes in each major group through hierarchical and hybrid clustering approach. Due to high resolution, obtained in hierarchical clustering, classification achieved through this algorithm was further used for selection of potential parents. One genotype from each smaller group was selected based on the maximum seed yield potential and hybridized in a line × tester mating design producing 36 F 1 cross combinations. These F 1 s along with their parents were studied in open field conditions for validating the efficacy of identified heterotic groups in sunflowers genetic material under study. Data for 11 agronomic and qualitative traits were recorded. These 36 F 1 combinations were tested for their combining ability (General/Specific), heterosis, genotypic and phenotypic correlation and path analysis. Results suggested that F 1 hybrids performed better for all the traits under investigation than their respective parents. Findings of the study validated the use of machine learning approaches in practical plant breeding; however, more accurate and robust clustering algorithms need to be developed to handle the data noisiness of open field experiments.
multidisciplinary sciences
What problem does this paper attempt to address?
The paper attempts to address the problem of identifying heterotic groups in sunflowers through machine learning methods to optimize plant breeding programs. Specifically, the study aims to: 1. **Identify heterotic group patterns**: Utilize machine learning methods (hierarchical clustering, K-means clustering, and hybrid clustering algorithms) to comprehensively analyze morphological, molecular biology, and proteomics data of sunflowers to identify effective heterotic groups. 2. **Improve breeding efficiency**: By identifying and efficiently utilizing heterotic group patterns, save time and resources, and accelerate the breeding process of new plant hybrid varieties. 3. **Validate the effectiveness of machine learning applications**: Apply machine learning methods in actual breeding and verify their effectiveness and resolution in identifying heterotic groups. 4. **Select parental materials**: Based on the results of machine learning, select parental materials with high yield potential for hybridization experiments and evaluate the performance of F1 hybrid varieties. In summary, this study aims to optimize the sunflower breeding process and improve crop yield and genetic improvement effects through multi-dimensional data integration and the application of machine learning algorithms.