Artificial intelligence optical hardware empowers high-resolution hyperspectral video understanding at 1.2 Tb/s

Maksim Makarenko,Qizhou Wang,Arturo Burguete-Lopez,Silvio Giancola,Bernard Ghanem,Luca Passone,Andrea Fratalocchi
2023-12-17
Abstract:Foundation models, exemplified by GPT technology, are discovering new horizons in artificial intelligence by executing tasks beyond their designers' expectations. While the present generation provides fundamental advances in understanding language and images, the next frontier is video comprehension. Progress in this area must overcome the 1 Tb/s data rate demanded to grasp real-time multidimensional video information. This speed limit lies well beyond the capabilities of the existing generation of hardware, imposing a roadblock to further advances. This work introduces a hardware-accelerated integrated optoelectronic platform for multidimensional video understanding in real-time. The technology platform combines artificial intelligence hardware, processing information optically, with state-of-the-art machine vision networks, resulting in a data processing speed of 1.2 Tb/s with hundreds of frequency bands and megapixel spatial resolution at video rates. Such performance, validated in the AI tasks of video semantic segmentation and object understanding in indoor and aerial applications, surpasses the speed of the closest technologies with similar spectral resolution by three to four orders of magnitude. This platform opens up new avenues for research in real-time AI video understanding of multidimensional visual information, helping the empowerment of future human-machine interactions and cognitive processing developments.
Computer Vision and Pattern Recognition,Artificial Intelligence,Optics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the bottleneck faced by current hardware technologies in processing real - time multi - dimensional video information, especially that the data transfer speed cannot meet the requirement of 1 terabit per second (Tb/s). Existing hardware technologies have problems of slow speed and insufficient resolution when processing high - resolution hyperspectral videos, which restricts the further development of artificial intelligence (AI) in the field of video understanding. Specifically, the paper points out: 1. **Limitations of Existing Technologies**: Although the currently most advanced snapshot hyperspectral devices can record more than 100 frequency bands, their data processing speed is three to four orders of magnitude slower than the required speed and they are unable to record at the video rate. Faster hyperspectral and multispectral technologies, although increasing the frame rate, reduce the spectral resolution, and accurate one - dimensional scanners cannot meet the spatial resolution requirements of two - dimensional image streams at the video rate. 2. **Challenges of Data Transfer Speed**: The electronic data transfer speed is a key bottleneck in achieving terabit - per - second multi - modal data processing. For example, the DDR5 memory bandwidth is 500 Gb/s, far from meeting the requirement. To solve these problems, the paper proposes an integrated optoelectronic platform based on hardware acceleration for real - time multi - dimensional video understanding. This platform combines AI hardware and advanced machine - vision networks and is able to process video data with hundreds of frequency bands and megapixel spatial resolution at a speed of 1.2 Tb/s. This performance has been verified in AI tasks such as video semantic segmentation and object understanding, especially in indoor and aerial applications, where its speed is three to four orders of magnitude faster than existing technologies with similar spectral resolution. Through this innovative platform, researchers hope to open up new research directions for real - time AI video understanding and promote the development of future human - machine interaction and cognitive processing.