Genome Sequence Classification for Animal Diagnostics with Graph Representations and Deep Neural Networks

Sai Narayanan,Akhilesh Ramachandran,Sathyanarayanan N. Aakur,Arunkumar Bagavathi
DOI: https://doi.org/10.48550/arXiv.2007.12791
2020-07-25
Abstract:Bovine Respiratory Disease Complex (BRDC) is a complex respiratory disease in cattle with multiple etiologies, including bacterial and viral. It is estimated that mortality, morbidity, therapy, and quarantine resulting from BRDC account for significant losses in the cattle industry. Early detection and management of BRDC are crucial in mitigating economic losses. Current animal disease diagnostics is based on traditional tests such as bacterial culture, serolog, and Polymerase Chain Reaction (PCR) tests. Even though these tests are validated for several diseases, their main challenge is their limited ability to detect the presence of multiple pathogens simultaneously. Advancements of data analytics and machine learning and applications over metagenome sequencing are setting trends on several applications. In this work, we demonstrate a machine learning approach to identify pathogen signatures present in bovine metagenome sequences using k-mer-based network embedding followed by a deep learning-based classification task. With experiments conducted on two different simulated datasets, we show that networks-based machine learning approaches can detect pathogen signature with up to 89.7% accuracy. We will make the data available publicly upon request to tackle this important problem in a difficult domain.
Machine Learning,Quantitative Methods
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to quickly and accurately identify the pathogens causing bovine respiratory disease complex (BRDC) in the metagenomic sequences of cattle. Although traditional detection methods such as bacterial culture, serological tests, and polymerase chain reaction (PCR) tests are effective for validating a variety of diseases, their main challenge lies in their limited ability to detect multiple pathogens simultaneously. In addition, these methods usually take a long time to identify pathogens, which poses an obstacle to the rapid diagnosis and management of BRDC to reduce economic losses. Therefore, this study proposes a machine - learning - based method that uses k - mer network embedding and deep - learning classification tasks to identify pathogen features in bovine metagenomic sequences. Experiments were carried out using two different simulated datasets, and the research shows that the network - based machine - learning method can detect pathogen features with an accuracy of up to 89.7%. This method not only improves the detection efficiency but also can detect known and newly emerging pathogens in a single test, thus achieving important progress in the field of animal disease diagnosis.