A probabilistic analysis of a pattern matching problem

Mikhail J. Atallah,Philippe Jacquet,Wojciech Szpankowski
DOI: https://doi.org/10.1002/rsa.3240040206
1993-01-01
Random Structures and Algorithms
Abstract:The study and comparison of strings of symbols from a finite or an infinite alphabet is relevant to various areas of science, notably molecular biology, speech recognition, and computer science. In particular, the problem of finding the minimum “distance” between two strings (in general, two blocks of data) is of a practical importance. In this article we investigate the (string) pattern matching problem in a probabilistic framework, namely, it is assumed that both strings form an independent sequences of i.i.d. symbols. Given a text string a of length n and a pattern string b of length m, let Mm,n be the maximum number of matches between b and all m‐substrings of a. Our main probabilistic result shows that for a wide range of input parameters in probability (pr.) provided m, n →∞ such that log n = o(m), where P is the probability of a match between any two symbols of these strings, and T is the probability of a match between two positions in the text string and a given position of the pattern string. We also prove that Mm,n/m→P almost surely (a.s.) for log n = o(m). © 1993 John Wiley & Sons. Inc.
mathematics, applied,computer science, software engineering
What problem does this paper attempt to address?