Pattern matching with wildcards using words of shorter length

Meng Zhang,Yi Zhang,Liang Hu
DOI: https://doi.org/10.1016/j.ipl.2010.09.011
IF: 0.851
2010-01-01
Information Processing Letters
Abstract:The problem of pattern matching with wildcards is to find all the occurrences of a pattern of length m in a text of length n over a finite alphabet @S (both the text and the pattern are allowed to contain wildcards). Based on the prime number encoding scheme (Chaim Linhart, Ron Shamir, Faster pattern matching with character classes using prime number encoding, J. Comput. Syst. Sci. 75 (3) (2009) 155-162), we present a new integer encoding and an efficient fast Fourier transforms based algorithm for this problem. The algorithm takes O(nlogm) time to search the pattern in the text by computing one convolution. For matching with wildcards, our encoding uses fewer prime numbers and has shorter code words comparing with the prime number encoding. We use at most 2lg|@S| prime numbers to encode the symbols while in the prime number encoding |@S| prime numbers are required. This number reduces to 1.5lg|@S| when |@S|>40. The code word used in the algorithm is at most [email protected]?lg|@S|@[email protected]?lg(5m)@? bits while in the prime encoding it is at least |@S|lgm bits. We also show that the length of words can be further reduced by increasing the number of convolutions computed.
What problem does this paper attempt to address?