TFD: A Multi-Pattern Matching Algorithm for Large-Scale URL Filtering

Zhenlong Yuan,Baohua Yang,Xiaoqi Ren,Yibo Xue
DOI: https://doi.org/10.1109/iccnc.2013.6504109
2013-01-01
Abstract:During the past decade, URL filtering systems have been widely applied to prevent people from browsing undesirable or malicious websites. However, the key method of URL filtering, such as URL blacklist filter, is more challenging due to the limited performance of existing multi-pattern matching algorithms. In this paper, we propose a multi-pattern matching algorithm named TFD for large-scale and high-speed URL filtering. TFD employs Two-phase hash, Finite state machine and Double-array storage to eliminate the performance bottleneck of blacklist filter. Experimental results show that TFD achieves better performance than existing work in terms of matching speed, preprocessing time and memory usage. Specially, on large-scale URL pattern sets (over 10 million URLs), with single thread, TFD's matching speed reaches over 100Mbps on a general x86 platform.
What problem does this paper attempt to address?