Abstract:A classical measure of string comparison is given by the longest common subsequence (LCS) problem on a pair of strings. We consider its generalisation, called the semi-local LCS problem, which arises naturally in many string-related problems. The semi-local LCS problem asks for the LCS scores for each of the input strings against every substring of the other input string, and for every prefix of each input string against every suffix of the other input string. Such a comparison pattern provides a much more detailed picture of string similarity than a single LCS score; it also arises naturally in many string-related problems. In fact, the semi-local LCS problem turns out to be fundamental for string comparison, providing a powerful and flexible alternative to classical dynamic programming. It is especially useful when the input to a string comparison problem may not be available all at once: for example, comparison of dynamically changing strings; comparison of compressed strings; parallel string comparison. The same approach can also be applied to permutation strings, providing efficient solutions for local versions of the longest increasing subsequence (LIS) problem, and for the problem of computing a maximum clique in a circle graph. Furthermore, the semi-local LCS problem turns out to have surprising connections in a few seemingly unrelated fields, such as computational geometry and algebra of semigroups. This work is devoted to exploring the structure of the semi-local LCS problem, its efficient solutions, and its applications in string comparison and other related areas, including computational molecular biology.

On an alternative sequence comparison statistic of Steele

Multiple Alignment-Free Sequence Comparison

Stochastic Comparisons of Spacings of Record Values from One or Two Sample Sequences

The Chvátal-Sankoff problem: Understanding random string comparison through stochastic processes

Finite Width Model Sequence Comparison

A statistical physics perspective on alignment-independent protein sequence comparison

On-Line Selection of Alternating Subsequences from a Random Sample

Sequential Selection of a Monotone Subsequence from a Random Permutation

Similarity of symbolic sequences

Stein's method for comparison of univariate distributions

DNA sequence comparison by a novel probabilistic method

Large and Small Deviations for Statistical Sequence Matching

Stochastic Comparisons on Sample Extremes of Dependent and Heterogenous Observations

Protein Sequence Comparison Based on K-string Dictionary

Semi-local string comparison: algorithmic techniques and applications

Worst-case vs Average-case Design for Estimation from Fixed Pairwise Comparisons

Asymptotic Stochastic Comparison of Random Processes

Stochastic Somparisons of Order Statistics from Scaled and Interdependent Random Variables

A test against trend in random sequences

An efficient Z-score algorithm for assessing sequence alignments

Alignment-Free Sequence Comparison Based on Next Generation Sequencing Reads: Extended Abstract.