An Efficient Vector Memory Unit for SIMD DSP

haiyan chen,zhong liu,sheng liu,sheng ma
DOI: https://doi.org/10.1007/978-3-662-45815-0_1
2015-01-01
Abstract:The SIMD DSP is highly efficient for embedded applications whose parallel data are aligned. However, there are many unaligned and irregular data accesses in typical embedded algorithms such as FFT, FIR. The vectorization of these kinds of algorithms will need many additional shuffle instruction operations in the SIMD architecture with alignment restriction, which greatly decreases the computation efficiency with the increasing SIMD width. This paper proposes an efficient vector memory unit (VMU) with 16 memory blocks on a 16-way SIMD DSP, M-DSP. Each memory block contains four groups of multi-bank memory structure with most-lowest-bit interleaved addressing and affords double bandwidth as needed to reduce the parallel vector access conflicts. A high-bandwidth data shuffle unit capable of dual vector accesses alignment is carried out in the vector access pipelining, which not only efficiently supports the unaligned access but also the special vector access patterns for FFT. The experimental results have shown that the VMU could afford conflict-free parallel accesses between DMA and vector Load/Stores operations with no more than 10% area overhead, and M-DSP achieves an ideal accelerate rate for FFT and FIR algorithms.
What problem does this paper attempt to address?