Abstract:Background: Most multiple sequence alignment and string-graph alignment algorithms focus on global alignment, but many applications exist for semi-global and local string-graph alignment. Long reads require enormous amounts of memory and runtime to fill out large dynamic programming tables. Effective algorithms for finding the backbone and thus defining a band of an alignment such as the longest common subsequence with kmer matches (LCSk++) exist but do not work with graphs. This study introduces an adaptation of the Longest Common Subsequence with kmer matches (LCSk++) algorithm tailored for graph structures, particularly focusing on Partial Order Alignment (POA) graphs. POA graphs, which are directed acyclic graphs, represent multiple sequence alignments and effectively capture the relationships between sequences. Current state of the art methods like ABPOA and SPOA, while improving POA, primarily focus on global alignment and thus are limited in local and semi-global banding scenarios. Our approach addresses these limitations by extending the LCSk++ algorithm to accommodate the complexities of graph-based alignment. Results: Our extended LCSk++ algorithm integrates dynamic programming and graph traversal techniques to detect conserved regions within POA graphs, termed the LCSk++ backbone. This backbone enables precise banding of the POA matrix for local and semi-global alignment, significantly enhancing the construction of consensus sequences. Compared to unbanded semi-global POA, our method demonstrates substantial memory savings (up to 98%) and significant run-time reductions (up to 37-fold), particularly for long sequences. The method maintains high alignment scores and proves effective across various string lengths and datasets, including synthetic and PacBio HiFi reads. Parallel processing further enhances runtime efficiency, achieving up to 150x speed improvements on conventional PCs. Conclusion: The extended LCSk++ algorithm for graph structures offers a substantial advancement in sequence alignment technology. It effectively reduces memory consumption and optimizes run times without compromising alignment quality, thus providing a robust solution for local and semi-global alignment in POA graphs. This method enhances the utility of POA in critical applications such as multiple sequence alignment for phylogeny construction and graph-based reference alignment .

Parallel Longest Common SubSequence Analysis In Chapel

Efficient Algorithms for Finding a Longest Common Increasing Subsequence

Efficient algorithms for the longest common subsequence in $k$-length substrings

Parallel linear space algorithm for large-scale sequence alignment

Parallel Computing for the Longest Common Subsequences in Network Intrusion Detection System

A Fast Longest Common Subsequence Algorithm for Biosequences Alignment

The aliens among us: How invasive species are transforming the planet—And ourselves Yale University Press, New Haven Connecticut, 2017, xii + 353 pp, ISBN 978-0-300-20890-0

Parallel Sparse Tensor Decomposition in Chapel

The colored longest common prefix array computed via sequential scans

John Delaney profile. A cautionary tale From Bermuda.

hLCS. A Hybrid GPGPU Approach for Solving Multiple Short and Unbalanced LCS Problems

Tunnel: Parallel-inducing sort for large string analytics

Longest Common Substring and Longest Palindromic Substring in $\tilde{\mathcal{O}}(\sqrt{n})$ Time

LCSKPOA: Enabling banded semi-global partial order alignments via efficient and accurate backbone generation through extended lcsk++

Some Parallel Approximation Algorithms for Multiple Sequence Alignment Problem

Polynomial-time equivalences and refined algorithms for longest common subsequence variants

Faster space-efficient STR-IC-LCS computation

Finding Diverse Strings and Longest Common Subsequences in a Graph

Semi-local string comparison: algorithmic techniques and applications

A sublinear time quantum algorithm for longest common substring problem between run-length encoded strings