Breaking free from references: a consensus-based approach for community profiling with long amplicon nanopore data

Willem Stock,Coralie Rousseau,Glen Dierickx,Sofie D'hondt,Luz Amadei Martinez,Simon M Dittami,Luna van der Loos,Olivier De Clerck
DOI: https://doi.org/10.1101/2024.07.04.602031
2024-07-07
Abstract:Third-generation sequencing platforms, such as Oxford Nanopore Technology (ONT), have made it possible to characterise communities through the sequencing of long amplicons. Whilst this theoretically allows for an increased taxonomic resolution compared to short-read sequencing platforms such as Illumina, the high error rate remains problematic to accurately identify the community members present within a sample. Here we present and validate CONCOMPRA, a tool that allows the detection of closely related strains within a community by drafting and mapping to consensus sequences. We show that CONCOMPRA outperforms several other tools for profiling bacterial communities using full-length 16S rRNA gene sequencing. Since CONCOMPRA does not rely on a sequence database for profiling communities, it is applicable to systems and amplicons for which a reference framework is poorly developed. Our validation test shows that the amplification of long PCR products is likely to produce chimeric byproducts that inflate alpha diversity and skew community structure, stressing the importance of chimera detection. CONCOMPRA is available on GitHub (https://github.com/willem-stock/CONCOMPRA).
Bioinformatics
What problem does this paper attempt to address?