Machine Learning-Aided Ultra-Low-Density Single Nucleotide Polymorphism Panel Helps to Identify the Tharparkar Cattle Breed: Lessons for Digital Transformation in Livestock Genomics

Harshit Kumar,Manjit Panigrahi,Dongwon Seo,Sunghyun Cho,Bharat Bhushan,Triveni Dutt
DOI: https://doi.org/10.1089/omi.2024.0153
Abstract:Cattle breed identification is crucial for livestock research and sustainable food systems, and advances in genomics and artificial intelligence present new opportunities to address these challenges. This study investigates the identification of the Tharparkar cattle breed using genomics tools combined with machine learning (ML) techniques. By leveraging data from the Bovine SNP 50K chip, we developed a breed-specific panel of single nucleotide polymorphisms (SNPs) for Tharparkar cattle and integrated data from seven other Indian cattle populations to enhance panel robustness. Genome-wide association studies (GWAS) and principal component analysis were employed to identify 500 SNPs, which were then refined using ML models-AdaBoost, bagging tree, gradient boosting machines, and random forest-to determine the minimal number of SNPs needed for accurate breed identification. Panels of 23 and 48 SNPs achieved accuracy rates of 95.2-98.4%. Importantly, the identified SNPs were associated with key productive and adaptive traits, thus attesting to the value and potentials of digital transformation in livestock genomics. The ML-aided ultra-low-density SNP panel approach reported here not only facilitates breed identification but also contributes to preserving genetic diversity and guiding future breeding programs.
What problem does this paper attempt to address?