Small Logic-based Multipliers with Incomplete Sub-Multipliers for FPGAs

Andreas Böttcher,Martin Kumm
2024-05-03
Abstract:There is a recent trend in artificial intelligence (AI) inference towards lower precision data formats down to 8 bits and less. As multiplication is the most complex operation in typical inference tasks, there is a large demand for efficient small multipliers. The large DSP blocks have limitations implementing many small multipliers efficiently. Hence, this work proposes a solution for better logic-based multipliers that is especially beneficial for small multipliers. Our work is based on the multiplier tiling method in which a multiplier is designed out of several sub-multiplier tiles. The key observation we made is that these sub-multipliers do not necessarily have to perform a complete (rectangular) NxK multiplication and more efficient sub-multipliers are possible that are incomplete (non-rectangular). This proposal first seeks to identify efficient incomplete irregular sub-multipliers and then demonstrates improvements over state-of-the-art designs. It is shown that optimal solutions can be found using integer linear programming (ILP), which are evaluated in FPGA synthesis experiments.
Hardware Architecture
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of implementing efficient small multipliers on FPGA (Field - Programmable Gate Array). Specifically, the paper focuses on how to design more efficient logic - based small multipliers, especially in the case of using low - precision data formats (such as 8 - bit and below) in AI inference tasks. #### Main problems: 1. **Limitations of DSP resources**: Large DSP blocks are inefficient when implementing multiple small multipliers. Therefore, a better method is needed to design small multipliers suitable for FPGA. 2. **Limitations of traditional rectangular sub - multipliers**: The traditional design method based on rectangular sub - multipliers has the problem of low resource utilization, especially when dealing with small multipliers. 3. **Optimizing resource utilization**: By introducing incomplete (non - rectangular) sub - multipliers, the required LUT (Look - Up Table) resources can be reduced while maintaining similar latency and throughput. #### Solutions: The paper proposes a method based on "multiplier tiling", in which the multiplier is composed of multiple sub - multipliers tiled together. The key innovation is that these sub - multipliers do not have to perform a complete N×K multiplication, but can be designed as incomplete (non - rectangular) sub - multipliers. This method is achieved through the following steps: 1. **Identifying efficient incomplete sub - multipliers**: Find the optimal combination of incomplete sub - multipliers through integer linear programming (ILP). 2. **Optimizing the compressed tree design**: Combine with the compressed tree design to further optimize resource utilization. 3. **Experimental verification**: Verify the effectiveness of the proposed method through FPGA synthesis experiments. #### Results: The experimental results show that using incomplete sub - multipliers can significantly reduce the required LUT resources in most cases, with a maximum reduction of 17.6% and an average reduction of 3.7%. This improvement is particularly obvious for small - size multipliers. ### Summary: This paper solves the problem of implementing efficient small multipliers on FPGA by introducing incomplete sub - multipliers, especially in the case of using low - precision data formats in AI inference tasks. Through optimizing multiplier tiling and compressed tree design, effective resource utilization is achieved and verified in experiments.