Practical Boolean Decomposition for Delay-driven LUT Mapping

Alessandro Tempia Calvino,Alan Mishchenko,Giovanni De Micheli,Robert Brayton
2024-06-10
Abstract:Ashenhurst-Curtis decomposition (ACD) is a decomposition technique used, in particular, to map combinational logic into lookup tables (LUTs) structures when synthesizing hardware designs. However, available implementations of ACD suffer from excessive complexity, search-space restrictions, and slow run time, which limit their applicability and scalability. This paper presents a novel fast and versatile technique of ACD suitable for delay optimization. We use this new formulation to compute two-level decompositions into a variable number of LUTs and enhance delay-driven LUT mapping by performing ACD on the fly. Compared to state-of-the-art technology mapping, experiments on heavily optimized benchmarks demonstrate an average delay improvement of 12.39%, and area reduction of 2.20% with affordable run time. Additionally, our method improves 4 of the best delay results in the EPFL synthesis competition without employing design-space exploration techniques.
Logic in Computer Science
What problem does this paper attempt to address?
The problems that this paper attempts to solve are the high complexity, limited search space, and excessive running time when the existing Ashenhurst - Curtis Decomposition (ACD) techniques are applied in hardware design synthesis. These problems limit the application scope and scalability of ACD techniques. To overcome these limitations, the paper proposes a new fast and flexible ACD technique suitable for latency optimization. Specifically, the main contributions of the paper include: 1. **Redefining ACD to improve computational efficiency**: - A new algorithm based on truth tables is proposed. This algorithm is 2 to 80 times faster than existing methods when performing the decomposition of two 6 - LUTs and can find more solutions. - The new algorithm is flexible in the size and number of variable sets (free set FS, bound set BS, and shared set SS) and supports any number of BS functions. 2. **Using ACD for latency optimization**: - ACD is integrated into the existing LUT mapper for latency optimization. This is the first practical and scalable work to apply ACD to latency - driven LUT mapping. - By using timing - critical variables in the free set and other variables in the bound set and shared set to calculate the functional decomposition, the worst - case latency is reduced. 3. **Experimental evaluation and comparison**: - By comparing with existing decomposition methods, the advantages of the new method in terms of quality and running time are demonstrated. - The experimental results show that mapping using ACD can significantly reduce latency, with an average latency improvement rate of 12.39% and an area reduction rate of 2.20%. - In the EPFL synthesis competition, the new method obtained 4 best results. ### Specific technical details #### 1. Truth - table - based ACD implementation The paper proposes a truth - table - based ACD implementation method with several improvements in computational efficiency. The main steps include: - **Variable partitioning**: Variables are partitioned into a free set (FS), a bound set (BS), and a shared set (SS). By enumerating different free sets, the optimal variable partitioning scheme is determined. - **Column multiplicity calculation**: Calculate the column multiplicity for each free set and select the minimum multiplicity value. - **Encoding problem solving**: By solving the minimum - cost covering problem, find the optimal encoding of BS functions and FS functions to minimize the size of the support set. #### 2. ACD for latency optimization The paper proposes an ACD - based latency optimization method. The specific steps are as follows: - **Checking the existence of decomposition**: Check whether there is a latency - minimized decomposition through Algorithm 1 (evaluate). This algorithm first re - orders the truth table to make the timing - critical variables the least important variables, and then calculates the column multiplicity to determine whether the decomposition can be achieved using no more than \( k - P_i \) BS functions. - **Calculating the decomposition**: If there is a decomposition, use Algorithm 2 (decompose) to calculate the actual decomposition. This algorithm finds the optimal encoding of BS functions and FS functions by solving the minimum - cost covering problem. #### 3. ACD integration in LUT mapping The paper integrates the above ACD method into the LUT mapping algorithm. The specific implementation includes: - **Calculation of large cuts**: During the cut enumeration process, calculate large cuts with a size of \( k < l \leq 11 \) and represent them as truth tables. - **Latency - minimized decomposition**: For non - \( k \)-feasible cuts, use Algorithm 1 to check whether there is a latency - minimized decomposition. If there is a decomposition, calculate the latency of the cut; otherwise, discard the cut. - **Area calculation**: To reduce the running time, the decomposition is not run in real - time, but the area is calculated pessimistically. The final circuit area is less affected. Through these improvements, the method proposed in the paper has achieved significant results in latency optimization and area reduction while maintaining high computational efficiency.