Reconfigurable Streaming Kernels for Multichannel Neurophysiological Recording Systems
Bo Yu,T. Mak,Xiangyu Li,Fei Xia,A. Yakovlev,Yihe Sun
2010-01-01
Abstract:Brain-machine interface (BMI) technology [1] offers an exciting means to study and to communicate with the complex brain. Neural activities can be observed or recorded through a range of neurophysiological measuring techniques and apparatus, such as functional magnetic resonance imaging (fMRI), electroencephalography (EEG) and multi-electrode arrays. These neurophysiological measuring systems are becoming the key components in many emerging neuro-prostheses and neuro-rehabilitation applications. With the rapid advance of micro-electrode technology, the temporal and spatial resolution of the electrode arrays increases drastically [2]. This greatly enhances the neural recording throughput and enables capability of studying large neural network ensembles. However, the dramatic increase in data bandwidth and data volume associated with multichannel recording requires a significant computational effort. Because of involving statistical operations and iterative numerical procedures, most neural signal analysis algorithms [3] are highly computational intensive. As a result, software-based approach for multichannel neural signal analysis often requires off-line processing. Reconfigurable system, such as Filed Programmable Gate Arrays (FPGAs), embeds massively parallel computational resources and provides an effective alternative for real-time neural signal processing and data mining for multichannel neural recordings. Neural signal processing and data mining usually comprise multiple steps of spikes filtering, feature extractions and statistical computations due to the poor signal-to-noise ratio of the recorded action potentials and complex spike encodings. These complex signal processing routines are highly computational expensive. As a result, there is a major design challenge for reconfigurable system design in terms of power dissipation and hardware area. In this poster, we present a reconfigurable kernel design methodology that exploits the self-similarity nature of neural spikes and, thus, eliminates the need of temporal storage in signal processing. Three aspects are presented in this poster. First, a spikestreaming processing design principle that leads to efficient hardware implementation is presented. This design principle is further exemplified by several commonly used neural signal analysis algorithms including spike feature extractions (principal component analysis (PCA)), the covariance analysis (covariance matrix calculation), multi-channel signals separation (independent component analysis (ICA)), and clustering algorithms (k-means algorithm). Second, an FPGA-based hardware implementation methodology using the streaming based algorithm is presented. The design of a streaming kernel for spike feature extraction is presented as an example to illustrate the idea of memory reduction in streaming architecture design. Third, the proposed streaming method is examined by comparing with traditional batch processing approach over the above mentioned neural signal analysis algorithms. Real clinical data, synthetic spike trains, synthetic spike times are utilized to verify our streaming method. The reductions on hardware resources and power consumption are also rigorously evaluated using Xilinx FPGA devices. The software evaluation results show that the proposed streaming method provides an approximation to the original batch processing algorithm. In the case of spike train analysis, it can achieve similar results as the original algorithm, due to similarities in spike train. The accuracy of the streaming method depends on the streaming window size or the number of data for the streaming. The hardware evaluation results show that the memory and power saved by the streaming method depend on how much data is used in batch processing method and algorithms. We use Xilinx System Generator as design tool and perform power analysis through Xilinx Xpower. Hardware resource utilization is reported by Xilinx ISE. From the result we know that 16.6% to 54% power consumption can be reduced by using our streaming method if implementing algorithms on Virtex6, and 8.3% to 67% power can saved if implementing algorithms on Spartan6. BRAMs usage in all implementations can also be greatly reduced by using our streaming approach.