An efficient parallelized L7-filter design for multicore servers

Danhua Guo,Laxmi Narayan Bhuyan,Bin Liu
DOI: https://doi.org/10.1109/TNET.2011.2177858
2012-01-01
Abstract:L7-filter is a significant deep packet inspection (DPI) extension to Netfilter in Linux's QoS framework. It classifies network traffic based on information hidden in the packet payload. Although the computationally intensive payload classification can be accelerated with multiple processors, the default OS scheduler is oblivious to both the software characteristics and the underlying multicore architecture. In this paper, we present a parallelized L7-filter algorithm and an efficient scheduler technique for multicore servers. Our multithreaded L7-filter algorithm can process the incoming packets on multiple servers boosting the throughput tremendously. Our scheduling algorithm is based on Highest Random Weight (HRW), which maintains the connection locality for the incoming traffic, but only guarantees load balance at the connection level. We present an Adapted Highest Random Weight (AHRW) algorithm that enhances HRW by applying packet-level load balancing with an additional feedback vector corresponding to the queue length at each processor. We further introduce a Hierarchical AHRW (AHRW-tree) algorithm that considers characteristics of the multicore architecture such as cache and hardware topology by developing a hash tree architecture. The algorithm reduces the scheduling overhead to O(log N) instead of O(N) and produces a better balance between locality and load balancing. Results show that the AHRW-tree scheduler can improve the L7-filter throughput by about 50% on a Sun-Niagara- 2-based server compared to a connection locality-based scheduler. Although extensively tested for L7-filter traces, our technique is applicable to many other packet processing applications, where connection locality and load balancing are important while executing on multiple processors. With these speedups and inherent software flexibility, our design and implementation provide a cost-effective alternative to the traffic monitoring and filtering ASICs.
What problem does this paper attempt to address?