Abstract:Growing pattern matching applications are employing finite automata as their basic processing model. These applications match tens to thousands of patterns on a large amount of data, which brings a great challenge to conventional processors. Therefore hardware-based solutions have emerged frequently and achieved high throuphput automata processing. However, existing methods are generally difficult to achieve both processing speed and storage efficiency, and are often too heavy to be integrated into a small chip and have to rely on off-chip DRAMs or other high capacity memories even on some simple data sets, leading to the potential area and power consumption issues. In this paper, we focus on building a more lightweight automata processing engine, hoping to store the whole automata model into on-chip memory and run effectively and independently. We propose LAP, a lightweight automata processing engine. Powered with a novel automata model (A-DFA) and efficient packing algorithms, extremely high storage efficiency compared with traditional DFA is achieved in LAP. Meanwhile, we identify the key parallelization factors in the A-DFA model and then propose a specialized microarchitecture with novel instructions to further accelerate the state transition process. As a result, LAP can obtain more effective trade-off between processing speed and storage efficiency. Evaluation results show that LAP achieves extremely high storage efficiency on simple data sets, exceeding IBM's RegX by 8×, and achieves significant improvements in processing speed ranging from 1.32× to 1.91× compared with previous lightweight hardware implementations. Moreover, LAP has good scalability in hardware architecture. It is easy to build an acceleration system with higher throughput by increasing the number of cores. We prototype a 16-core system into Xilinx ZC702 FPGA and a 64-core system into Xilinx ZCU102 FPGA respectively- The prototype system on ZC702 on average achieves 3.5 GB/s throughput on simple data sets, and the prototype system on ZCU102 can obtain higher throughput and compute density values on part of large datasets in ANMLZoo compared with modern in-memory NFA-based solutions.

Cache Automaton

Cache Automaton: Repurposing Caches for Automata Processing

CAMA: Energy and Memory Efficient Automata Processing in Content-Addressable Memories

Enabling Fast and Memory-Efficient Acceleration for Pattern Matching Workloads: The Lightweight Automata Processing Engine

Put an Elephant into a Fridge

DEAM：Decoupled, Expressive, Area-Efficient Metadata Cache

LAP: A Lightweight Automata Processor for Pattern Matching Tasks

Finite State Automata Design using 1T1R ReRAM Crossbar

Cache-Based Scalable Deep Packet Inspection with Predictive Automaton.

SPC-FA: synergic parallel compact finite automaton to accelerate multi-string matching with low memory.

Software-Hardware Codesign for Efficient In-Memory Regular Pattern Matching

FASTA: Revisiting Fully Associative Memories in Computer Microarchitecture

An Architecture-Level Cache Simulation Framework Supporting Advanced PMA STT-MRAM

A Scored Non-Deterministic Finite Automata Processor for Sequence Alignment

Statistical Cache Bypassing for Non-Volatile Memory

Design and Implementation of A High-Performance Microprocessor Cache Compression Algorithm

Enabling Efficient Dynamic Resizing of Large DRAM Caches via A Hardware Consistent Hashing Mechanism

Computer Architecture with Associative Processor Replacing Last Level Cache and SIMD Accelerator

CAMAS: Static and Dynamic Hybrid Cache Management for CPU-FPGA Platforms

Hap: A Spatial-von Neumann Heterogeneous Automata Processor with Optimized Resource and IO Overhead on FPGA

Adaptive Placement and Migration Policy for an STT-RAM-based Hybrid Cache