Entourage: all-in-one sequence analysis software for genome assembly, virus detection, virus discovery, and intrasample variation profiling

Worakorn Phumiphanjarphak,Pakorn Aiewsakun
DOI: https://doi.org/10.1186/s12859-024-05846-y
IF: 3.307
2024-06-27
BMC Bioinformatics
Abstract:Pan-virus detection, and virome investigation in general, can be challenging, mainly due to the lack of universally conserved genetic elements in viruses. Metagenomic next-generation sequencing can offer a promising solution to this problem by providing an unbiased overview of the microbial community, enabling detection of any viruses without prior target selection. However, a major challenge in utilising metagenomic next-generation sequencing for virome investigation is that data analysis can be highly complex, involving numerous data processing steps.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in virology research, due to the lack of conserved gene elements common to viruses, traditional detection methods (such as PCR and Sanger sequencing) have limited sensitivity in virus discovery and can only detect designed or very closely related target viruses. Although metagenomic next - generation sequencing (mNGS) provides a solution for unbiased detection of any virus, its data analysis process is complex and involves multiple data - processing steps, which has become a major challenge in using mNGS for virology research. To address this challenge, the authors developed Entourage, a comprehensive, multi - functional and simplified bioinformatics software for virus sequence detection, virus discovery and intra - sample genetic variation analysis. Entourage simplifies the data analysis process in virology by providing a one - stop service from read cleaning, sequence assembly to virus sequence search, while maintaining high - quality and interpretable results. Specifically, the main functions of Entourage include: 1. **Read Assembly Module**: Use MEGAHIT for de novo assembly from raw paired - end short reads, and perform read cleaning through fastp, and can also optionally subtract reads of specific organisms. 2. **Target Detection Module**: Use BLASTN to search a predefined virus collection based on nucleotide sequence similarity and assign taxa according to the best match of BLASTN. 3. **Discovery Module**: Use a search method based on amino acid similarity to detect virus sequences without prior knowledge, and can identify sequences of known viruses as well as novel viruses that may be distantly related to reference sequences. 4. **Intra - sample Variation Analysis Module**: Calculate sequence variations of known viruses in samples, and generate easy - to - analyze tables and publishable charts. Through these functions, Entourage aims to simplify the research process in virology, improve the efficiency of virus detection and discovery, while ensuring the reliability and accuracy of results.