A 1.2 mm$^2$ 416 mW 1.44 Mmat/s 64$\times$16 Matrix Preprocessing ASIC for Massive MIMO in 22FDX

Darja Nonaca,Christoph Studer
2024-10-18
Abstract:Massive multiuser (MU) multiple-input multiple-output (MIMO) enables concurrent transmission of multiple users to a multi-antenna basestation (BS). To detect the users' data using linear equalization, the BS must perform preprocessing, which requires, among other tasks, the inversion of a matrix whose dimension equals the number of user data streams. Explicit inversion of large matrices is notoriously difficult to implement due to high complexity, stringent data dependencies that lead to high latency, and high numerical precision requirements. We propose a novel preprocessing architecture based on the block-LDL matrix factorization, which improves parallelism and, hence, reduces latency. We demonstrate the effectiveness of our architecture through (i) massive MU-MIMO system simulations with mmWave channel vectors and (ii) measurements of a 22FDX ASIC, which is, to our knowledge, the first fabricated preprocessing engine for massive MU-MIMO with 64 BS antennas and 16 single-antenna users. Our ASIC reaches a clock frequency of 870 MHz while consuming 416 mW. At its peak throughput, the ASIC preprocesses 1.44 M 64$\times$16 matrices per second at a latency of only 0.7 $\mu$s.
Signal Processing,Hardware Architecture
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in large - scale multi - user (MU) multiple - input multiple - output (MIMO) systems, how to efficiently implement the pre - processing steps of linear minimum mean - square error (LMMSE) data detection. Specifically, the paper aims to design a new - type pre - processing architecture to reduce the complexity and latency of matrix pre - processing, thereby meeting the requirements of modern wireless communication systems for high throughput and low latency. ### Specific Background of the Problem 1. **Challenges in Large - Scale MU - MIMO Systems**: - Large - scale MU - MIMO systems allow multiple users to transmit data to multi - antenna base stations (BS) simultaneously. - In order to detect users' signals, the base station needs to perform pre - processing steps, which involve calculating and solving a matrix related to the number of user data streams. - The complexity of explicit matrix inversion increases rapidly with the increase in the number of users, resulting in high complexity, high latency, and strict numerical precision requirements. 2. **Limitations of Existing Methods**: - Existing hardware implementation methods (such as Cholesky, LU, LDL, QR decomposition, etc.) can achieve matrix inversion, but still face the problems of excessive complexity and latency in large - scale systems. - Approximation methods can reduce complexity, but will sacrifice bit - error - rate performance and perform poorly under certain conditions (for example, when the number of base station antennas is much larger than the number of users). ### The Solution in the Paper The paper proposes a pre - processing architecture based on block - LDL (BLDL) matrix decomposition, and its main features include: - **Improving Parallelism**: Through BLDL decomposition, parallel processing can be carried out on multiple data items, thereby reducing latency. - **Avoiding Forward Substitution**: Use specific methods to skip the forward substitution step, further reducing complexity. - **Sharing Hardware Resources**: Share hardware resources to reduce the silicon area. ### Experimental Verification The paper verifies the effectiveness of the proposed architecture in the following ways: - **System Simulation**: Use millimeter - wave channel vectors to conduct large - scale MU - MIMO system simulations. - **ASIC Measurement**: Design and fabricate a 22FDX ASIC chip for actual measurement. This chip can operate at a clock frequency of 870 MHz, with a power consumption of 416 mW, can process 1.44 M 64×16 matrices per second, and has a latency of only 0.7 µs. In conclusion, this paper aims to solve the complexity and latency problems of pre - processing steps in large - scale MU - MIMO systems. By proposing an efficient BLDL decomposition architecture, higher throughput and lower latency are achieved, thereby meeting the requirements of modern wireless communication systems.