F-TADOC: FPGA-Based Text Analytics Directly on Compression with HLS

Yanliang Zhou,Feng Zhang,Tuo Lin,Yuanjie Huang,Saiqin Long,Jidong Zhai,Xiaoyong Du
DOI: https://doi.org/10.1109/icde60146.2024.00287
2024-01-01
Abstract:With the development of loT and edge computing, data analytics on edge has become popular, and text analytics directly on compression (TADOC) has been proven to be a promising technology for edge data analytics. At the same time, Field Programmable Gate Array (FPGA) also has broad application prospects in data analytics systems. Unfortunately, there is no work to date showing how to support TADOC using FPGAs. We propose FPGA-based text analytics directly on compression with HLS, namely F - TADOC, which is the first framework using HLS to provide FPGA-based text analytics directly on compressed data. It effectively supports efficient text analytics on FPGA without decompressing input data. F-TADOC addresses three major challenges. First, TADOC involves a large number of dependencies with unbalanced workload of rules, which causes extremely low pipeline efficiency on FPG As. To solve it, we use layer-wise approach to traverse the DAG composed of rules and allocate different pipeline processing strategies for rules of different sizes. Second, the data volume required can be large that beyond the on-chip memory capacity of FPGAs. We develop a memory pool supporting hash structure and on-chip caches on FPGA to deal with this challenge. Third, when traversing the DAG, there are massive indirect addressing with a large number of random accesses. This leads to redundant time overhead caused by the latency in accessing the High Bandwidth Memory (HBM) during the pipeline. We optimize the F - TADOC algorithm by using dataflow to expand the nested loop, thus eliminate indirect addressing. With four widely used datasets, experiments show that F - TADOC achieves 4.63 x and 1.49 x performance speedup over TADOC and G- TADOC.
What problem does this paper attempt to address?