Abstract:Regular expressions (regexes) provide rich expressiveness to specify the signatures of intrusions and are widely used in contemporary network security systems for signature-based intrusion detection. To perform very fast regex matching, deterministic finite automata (DFA) has been the first choice because its time complexity is constant O ( 1 ) . Unfortunately, DFA often suffers the well known state explosion problem and, consequently, tends to require prohibitive memory overhead in practical applications. To address the problem, a wide variety of DFA compression techniques have been proposed; however, few can keep up with the ever increasing network traffic bandwidth and regex set complexity. This paper proposes that the DFA problem is rooted in regexes (rather than in DFA), i.e., semantic overlapping of regexes, and accordingly presents a complete algorithmic solution PaCC (Partition, Compression, and Combination), that can transform the given large-scale set of complex regexes into a compact and fast matching engine using DFA as its core. PaCC fundamentally defuses state explosion for DFA by partitioning complex regexes into overlapping-free segments. By exploiting the massive repetitiveness among the resulting segments, PaCC can further deflate corresponding DFA in terms of the number of states. Moreover, on the basis of the characteristics of these segments, PaCC takes a tailor-made compression approach and reduces over 96% of the state transitions for the corresponding DFA. In the final matching engine, the combination of DFA and a small relation mapping table, built from segments and their syntagmatic relations, respectively, guarantees high performance and semantic equivalence. Experimental evaluation shows that PaCC produces succinct matching engines with memory usage proportional to the size of the real-world Snort and Bro regex sets, with speeds of up to 1.7Gbps per core on a HP Z220 SFF workstation with a 3.40GHz Intel Core i7-3770.

A Fast Regular Expression Set Matching Algorithm Based on Bloom Filter

P4rex: Accelerating Regular Expression Matching with Programmable Switches

Towards Fast Regular Expression Matching in Practice

A Fast Exact Pattern Matching Algorithm for Biological Sequences

FREME: A Pattern Partition Based Engine for Fast and Scalable Regular Expression Matching in Practice

High Speed Regular Expression Matching Engine with Fast Pre-Processing

Efficient Parallelization Of Regular Expression Matching For Deep Inspection

Extraction of Fingerprint from Regular Expression for Efficient Prefiltering

Intelligent and Efficient Grouping Algorithms for Large-Scale Regular Expressions

A Fast Improved Pattern Matching Algorithm for Biological Sequences

Practical Regular Expression Matching Free of Scalability and Performance Barriers

Pattern-Based DFA for Memory-Efficient and Scalable Multiple Regular Expression Matching.

Building A Faster Boolean Matcher Using Bloom Filter

PTME: A Regular Expression Matching Engine Based on Speculation and Enumerative Computation on FPGA

An Algorithm of Mining Frequent Itemsets Based on Bloom Filter

Real-time Regular Expression Matching

Accelerating Boolean Matching Using Bloom Filter

Reducing the number of Bloom filters

Noisy Bloom Filters for Multi-Set Membership Testing

Reorganized and Compact DFA for Efficient Regular Expression Matching

Fast Packet Inspection Using State-Based Bloom Filter Engine