A Micro-architecture that supports the Fano-Elias encoding and a hardware accelerator for approximate membership queries

Guy Even,Gabriel Marques Domingues
DOI: https://doi.org/10.1016/j.micpro.2023.104992
IF: 3.503
2024-01-04
Microprocessors and Microsystems
Abstract:We present the first hardware design that supports operations over the Fano-Elias encoding (FE-encoding). Our design is a combinational circuit (i.e., single clock cycle) that supports insertions, deletions, and queries. FE-encoding allows one to store f binary strings, each of length l+logm using a string that is m+f+fl bits long (rather than f(l+logm) ). The asymptotic gate-count of the circuit is Θ((m+f)⋅lgm+f⋅l) . The asymptotic delay is Θ(lgm+lgf+lgl) . We implemented our design on an FPGA with four combinations of parameters in which the FE-encoding fits in 512 or 1024 bits. We present the first hardware design for a dynamic filter that maintains a set subject to insertions, deletions, and approximate membership queries. The design contains four main blocks: two memory banks that store FE-encodings and two combinational circuits for FE-encoding. Additional logic deals with double buffering and forwarding. We implemented the dynamic filter on an FPGA with the following parameters: (1) Elements in the dataset are 32-bit strings. (2) The supported dataset can contain up to nmax=45⋅214=737,280 elements. (3) The latency is 2-4 clock cycles. (4) Fixed (i.e., constant and stable) throughput. A new operation can be issued every clock cycle. (5) We prove that the probability of a false-positive error is bounded by 0.385⋅10−2 . (6) We prove that the expected number of insertion failures is less than 1 for every 75 million insertions. Synthesis of our filter on a Xilinx Alveo U250 FPGA achieves a clock rate of 100 MHz (the critical path is due to the memory access). We measure a fixed throughput of 97.7 million operations per second (the loss of 2.3% in the throughput is due to instabilities in the bandwidth of the AXI4 Lite I/O channel). A unique feature of our filter implementation is that the throughput is stable and constant for all benchmarks and loads. Namely, the combination of operations does not influence the throughput and the throughput does not depend on the number of elements in the dataset (as long as the cardinality of the dataset is bounded by nmax ). Previous dynamic filter implementations in software (implemented on x86 or GPU's) do not exhibit stable and constant throughputs.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?