HARD: Bit-Split String Matching Using a Heuristic Algorithm to Reduce Memory Demand

Xun Li,Lishui Chen,Yazhe Tang
2020-01-01
Romanian journal of information science and technology
Abstract:High-speed content inspection relies on a fast multi-pattern matching algorithm to detect predefined rules. When the number of target rules becomes large, the memory requirements of the matching engine become a critical issue. An effective technique to design high-performance matching engines is to divide the target rule set into multiple subgroups and to use a parallel matching hardware unit for each subgroup. The key to this effective technique is how to find a strategy to divide subgroups. This paper proposes an effective rule classifying method referred to as HARD for heterogeneous bit-split string matching architectures. HARD uses the uniqueness of the target pattern to classify all target rule characters. This paper also presents a method to estimate the distance between strings in unique pattern category. The distance formula is next used to find a class for each rule. Furthermore, each class will be processed on different sizes of finite state machine. The experimental results show that the more the number of rules in the rule set, the more obvious the effect of HARD. In popular data sets, when the number of rules is above 4000, HARD can save nearly 50% of memory consumption compared to the previous bit-split string matching methods mentioned in the paper.
What problem does this paper attempt to address?