Abstract:Network processors (NPs) are widely used in many types of networking equipment due to their high performance and flexibility. For most NPs, software cache is used instead of hardware cache due to the chip area, cost and power constraints. Therefore, programmers should take full responsibility for software cache management which is neither intuitive nor easy to most of them. Actually, without an effective use of it, long memory access latency will be a critical limiting factor to overall applications. Prior researches like hardware multi-threading, wide-word accesses and packet access combination for caching have already been applied to help programmers to overcome this bottleneck. However, most of them do not make enough use of the characteristics of packet processing applications and often perform intraprocedural optimizations only. As a result, the binary codes generated by those techniques often get lower performance than that comes from hand-tuned assembly programming for some applications. In this paper, we propose an algorithm including two techniques - Critical Path Based Analysis (CPBA) and Global Adaptive Localization (GAL), to optimize the software cache performance of packet processing applications. Packet processing applications usually have several hot paths and CPBA tries to insert localization instructions according to their execution frequencies. For further optimizations, GAL eliminates some redundant localization instructions by interprocedural analysis and optimizations. Our algorithm is applied on some representative applications. Experiment results show that it leads to an average speedup by a factor of 1.974.

Experience on Applying Push Model to Packet Processors in High Performance Routers.

Improving the Throughput and Delay Performance of Network Processors by Applying Push Model

Towards High-Performance Flow-Level Packet Processing on Multi-Core Network Processors

Modeling and Analyzing the Performance of High-Speed Packet I/O

Adaptive Packet Classification Algorithm Based on Ixp2800 Network Processor

Towards Power Efficient High Performance Packet I/O

High Performance Packet Processing with FlexNIC

Power Efficient High Performance Packet I/O

On the Extreme Parallelism Inside Next-Generation Network Processors

An Efficient Scheduling Mechanism With Flow-Based Packet Reordering In A High-Speed Network Processor

Efficiency of Cache Mechanism for Network Processors

Cooperative Mechanism of Local Memory and Cache in Network Processors

Towards Optimized Packet Classification Algorithms for Multi-Core Network Processors

Optimizing Software Cache Performance of Packet Processing Applications

Optimal Placement of Cores, Caches and Memory Controllers in Network On-Chip

High Throughput Memory Data-Path Design For Multi-Core Architecture

High-performance Packet Classification Algorithm for Multithreaded IXP Network Processor.

Rethinking Programmed I/O for Fast Devices, Cheap Cores, and Coherent Interconnects

Hybrid Cache Architecture for High Speed Packet Processing

Hardware Support for Message-Passing in Chip Multi-Processors.

Accelerating Data Movement on Future Chip Multi-Processors