Detection of mastitis-causing pathogen by sequencing different regions of the 16S rRNA gene and machine learning
L. Clemente
Abstract:The correct identification of mastitis-causing pathogens is a key factor in the successful management of dairy farms. Techniques such as culture medium, qPCR, and 16S rRNA sequencing have been used to detect important microorganisms in raw bovine milk samples. However, due to costs, some challenges remain. Machine learning methods have been shown as an attractive alternative, as they can integrate different sources of data, with a diversity of purposes. New studies focusing on the detection of clinical and subclinical mastitis highlight the potential of applied machine learning methods to the management of mastitis in dairy farms. In this work, we evaluate the performance of three machine learning methods to detect the most abundant mastitis-causing pathogen in individual raw milk bovine samples integrating data from milk composition and 16S rRNA sequencing. We show the potential for the identification of Escherichia coli a nd Staphylococcus aureus . For abundance greater than 3% in individual samples, an accuracy of 100% and 86% was achieved, respectively. These results show that not only subclinical and clinical mastitis can be detected by machine learning methods, but some mastitis-causing pathogens either. Moreover, to maximize the information obtained from 16S rRNA sequencing, we evaluate in silico genetic diversity for different regions of the 16S rRNAgene and validate the results by Illumina sequencing. We show that for better detection of microorganisms associated with bovine mastitis, the V2-V3 region detects a higher prevalence with more relative abundance. We hope that this work can contribute to better management of dairy farms as well as the development of new tools for the control of bovine mastitis. Abstract Modern microbiome studies relies on the correct identification of microbial communities and their impact on different life phenomena. Historically, conventional techniques, such as classical Gram staining, were used for the identification of cultured bacteria from clinical, food, and environmental origins. However, cultured bacteria communities represent only a fraction of the real diversity. This limitation was partially overcome with the advent of next-generation sequencing (NGS) technologies and 16S rRNA marker gene, but challenges remain. The choice of a reference database for classification can be tricky. Larger databases make it potentially more difficult to assign taxonomy at genus and species-level as the likelihood of ambiguous assignment increases, but smaller databases possibly do not contain a sufficient representation of species. In an attempt to overcome these limitations, some dedicated reference databases were constructed for a certain niche, such as HITdb (Human Intestinal 16S rRNA database) and DAIRYdb. We hypothesize that in cases when only a few species or genera are needed to be detected, targeting a region of 16S rRNA gene marker that maximizes genetic diversity for this species would lead to a better taxonomic assignment. Here, we evaluate the genetic diversity for the V2-V3, V4, and V5-V6 regions for the most common genera associated with mastitis in bovine milk, and contrast the results of the lowest and highest genetic diversity regions by Illumina sequencing. We show that our approach increases the number of species-level assigned sequences. Abstract The correct identification of mastitis-causing pathogens is an important step in management of dairy farms. However, due to costs, techniques such as qPCR and 16S rRNA sequencing remain unfeasible for some herds. Machine learning methods have been shown as an attractive alternative. New studies focusing on the detection of clinical and subclinical mastitis highlight the potential of applied machine learning methods to the management of mastitis in dairy farms. Few studies of machine learning models were applied to detection mastitis-causing pathogen. In this work, we evaluate the performance of three machine learning methods to detect the most abundant mastitis-causing pathogen in individual raw milk bovine samples integrating data from milk composition and 16S rRNA sequencing. For abundance greater than 3% in individual samples, an accuracy of 100% for Escherichia coli and 86% for Staphylococcus aureus were achieved. These results show that not only subclinical and clinical mastitis can be detected by machine learning methods, but also some mastitis-causing pathogens
Computer Science,Agricultural and Food Sciences