Abstract:Abstract High‐throughput sequencing has become an increasingly central component of microbiome research. The development of de Bruijn graph‐based methods for assembling high‐throughput sequencing data has been an important part of the broader adoption of sequencing as part of biological studies. Recent advances in the construction and representation of de Bruijn graphs have led to new approaches that utilize the de Bruijn graph data structure to aid in different biological analyses. One type of application of these methods has been in alternative approaches to the assembly of sequencing data like gene‐targeted assembly, where only gene sequences are assembled out of larger metagenomes, and differential assembly, where sequences that are differentially present between two samples are assembled. de Bruijn graphs have also been applied for comparative genomics where they can be used to represent large sets of multiple genomes or metagenomes where structural features in the graphs can be used to identify variants, indels, and homologous regions in sequences. These de Bruijn graph‐based representations of sequencing data have even begun to be applied to whole sequencing databases for large‐scale searches and experiment discovery. de Bruijn graphs have played a central role in how high‐throughput sequencing data is worked with, and the rapid development of new tools that rely on these data structures suggests that they will continue to play an important role in biology in the future.Highlights de Bruijn graph‐based sequence assembly approaches have been an essential part of the broad application of sequencing methods, especially in microbiome research. de Bruijn graphs can be used to efficiently represent sequencing data in a format that is highly scalable and can be extended and modified to address different research questions. de Bruijn graph‐based analysis methods have been developed for comparative genomics, the identification of genetic variants, and for large‐scale searching of unassembled sequencing data. The de Bruijn graph data structure will continue to be a central component of sequence assembly and analysis approaches in the future.

A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

TwoPaCo: an efficient algorithm to build the compacted de Bruijn graph from many complete genomes

Cuttlefish: fast, parallel and low-memory compaction of de Bruijn graphs from large-scale genome collections

BdBG: a Bucket-Based Method for Compressing Genome Sequencing Data with Dynamic De Bruijn Graphs.

PanGraph: scalable bacterial pan-genome graph construction

Unbiased pangenome graphs

Memory Efficient De Bruijn Graph Construction

Compressive Pangenomics Using Mutation-Annotated Networks

Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2

Building pangenome graphs

Debwt: Parallel Construction of Burrows-Wheeler Transform for Large Collection of Genomes with De Bruijn-branch Encoding.

Cluster efficient pangenome graph construction with nf-core/pangenome

Compression of high throughput sequencing data with probabilistic de Bruijn graph

Applications of de Bruijn graphs in microbiome research

The design and construction of reference pangenome graphs

Lock-free de Bruijn graph

Genotype Representation Graphs: Enabling Efficient Analysis of Biobank-Scale Data

Pangenome graph layout by Path-Guided Stochastic Gradient Descent

Scaling metagenome sequence assembly with probabilistic de Bruijn graphs

Metannot: A succinct data structure for compression of colors in dynamic de Bruijn graphs