A Fast Regular Expression Set Matching Algorithm Based on Bloom Filter

Ke-fu Xu,De-yu Qi,Wei-ping Zheng,Zheng-ping Qian
DOI: https://doi.org/10.3321/j.issn:1000-565X.2009.04.007
2009-01-01
Abstract:The effectiveness of the regular expression searching algorithms are proportional to the shortest path L min from the initial state to the final state of NFA and is inversely proportional to the prefix set Pref (RE) of the language that denotes the regular expression. In general, the elements in Pref (RE) are difficult to locate in the target text because the set of Pref (RE) is large. Proposed in this paper is a regular expression searching algorithm based on the Bloom Filter of which computation time to perform the query is independent of the string number. The proposed algorithm can fast locate Pref (RE) and perform a search with the speed immune from Pref (RE), and, particularly, when multiple parallel Bloom Filters are employed, the algorithm may indirectly lengthen the shortest path. Analysis and experimental results indicate that the proposed algorithm greatly accelerates the search of regular expressions, especially for the search of an regular expression set, and that the searching speed increases several times and even up to tens of times when L min and Pref (RE) values are both large. It is thus concluded that the proposed algorithm is suitable for the fast search of multiple regular expressions on a large scale.
What problem does this paper attempt to address?