Smart Vision Chip
Liyuan Liu,Peng Feng,Xu Yang,Shuangming Yu,Runjiang Dou,Jian Liu,Nanjian Wu
DOI: https://doi.org/10.1360/tb-2023-0859
2023-01-01
Abstract:Vision chips are a fusion of image sensors, processors and data storage. They are a tight combination of image sensors and processors. Data from the image sensor could be processed in situ by a processor, which shows lower latency and lower power performance. Meanwhile, the sensor could also be evaluated continuously and configured in real time, which allows the sensor to always work in a proper status. The design of a vision chip is a nontrivial task that requires a good understanding of image sensor, processor, algorithm and integration technology. Vision chips can be multibit-based or spiking-based according to the type of sensor and processor integrated. The algorithm running on the vision chip can be a classical computational algorithm with handcrafted features or a deep neural network algorithm that relies on training. The CMOS image sensor is the best candidate for building vision chips. The basic principle behind this process is the photoelectric effect in Si materials. A photo diode is the fundamental sensing device. Inventions of pinned photodiode and 4-T pixel topology are two significant processes that greatly reduce dark current and reset noise and improve image quality. The architecture of image sensors includes column-parallel, chip-parallel and pixel-parallel fashions, among which column-parallel architectures have become very popular because of their good balance between complexity and readout parallelism. Currently, image sensors are developing toward higher resolution, higher frame rate, higher dynamic vision, 3D image and multispectrum vision capabilities, which significantly increase data size. To reduce the data burden, spiking image sensors were developed that output spiking maps instead of gray images. To process the image data in situ, two kinds of vision processors can be adopted. For gray images, a multibit-based architecture is needed. This architecture has evolved from application-specified design to flexible programmable design. Currently, programmable processors are mainstream owing to their high flexibility. To process images smartly, convolutional neural networks (CNNs) are now widely adopted. However, CNNs are both computationally intensive and storage intensive. To process CNNs on chips more efficiently, hardware and software optimization techniques are both needed. Hardware-wise, parallel computing is the basis for dealing with intensive computation. Employing quantization, sparsity and data reuse is very useful to reduce computational complexity and power consumption. For spiking images, the spiking-based processor is preferred. Some spiking processors employ brain-like cross-bar topology. Although achieving low power consumption, the hardware cost increases when dealing with complex algorithms. Others adopt a time-multiplexing method to design a spiking processor that is cost effective. Software-wise, classical computer vision (CV) algorithms use handcrafted features and behave well in specific applications. However, CNNs show more robust and accurate performance. As CNNs need more computational capacity, neural network pruning and effective quantization methods are important to reduce the computational burden. For various spiking-based processors, a neural network operating on it can be built up using transfer learning. To hook up the image sensor and image processor, plain integration and 3-D integration could be adopted. Compared to plain integration, 3-D integration allows the optimization of sensors and processors with suitable fabrication technologies. 3-D integration also allows cramming more memories on chip. In the future, mixed-signal processing and computing in memory techniques could be employed for more efficient computing, and novel 2-D materials may open new ways for sensing-storagecomputing fusion in a single device.