Abstract:Foundation models, exemplified by GPT technology, are discovering new horizons in artificial intelligence by executing tasks beyond their designers' expectations. While the present generation provides fundamental advances in understanding language and images, the next frontier is video comprehension. Progress in this area must overcome the 1 Tb/s data rate demanded to grasp real-time multidimensional video information. This speed limit lies well beyond the capabilities of the existing generation of hardware, imposing a roadblock to further advances. This work introduces a hardware-accelerated integrated optoelectronic platform for multidimensional video understanding in real-time. The technology platform combines artificial intelligence hardware, processing information optically, with state-of-the-art machine vision networks, resulting in a data processing speed of 1.2 Tb/s with hundreds of frequency bands and megapixel spatial resolution at video rates. Such performance, validated in the AI tasks of video semantic segmentation and object understanding in indoor and aerial applications, surpasses the speed of the closest technologies with similar spectral resolution by three to four orders of magnitude. This platform opens up new avenues for research in real-time AI video understanding of multidimensional visual information, helping the empowerment of future human-machine interactions and cognitive processing developments.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the bottleneck faced by current hardware technologies in processing real - time multi - dimensional video information, especially that the data transfer speed cannot meet the requirement of 1 terabit per second (Tb/s). Existing hardware technologies have problems of slow speed and insufficient resolution when processing high - resolution hyperspectral videos, which restricts the further development of artificial intelligence (AI) in the field of video understanding. Specifically, the paper points out: 1. **Limitations of Existing Technologies**: Although the currently most advanced snapshot hyperspectral devices can record more than 100 frequency bands, their data processing speed is three to four orders of magnitude slower than the required speed and they are unable to record at the video rate. Faster hyperspectral and multispectral technologies, although increasing the frame rate, reduce the spectral resolution, and accurate one - dimensional scanners cannot meet the spatial resolution requirements of two - dimensional image streams at the video rate. 2. **Challenges of Data Transfer Speed**: The electronic data transfer speed is a key bottleneck in achieving terabit - per - second multi - modal data processing. For example, the DDR5 memory bandwidth is 500 Gb/s, far from meeting the requirement. To solve these problems, the paper proposes an integrated optoelectronic platform based on hardware acceleration for real - time multi - dimensional video understanding. This platform combines AI hardware and advanced machine - vision networks and is able to process video data with hundreds of frequency bands and megapixel spatial resolution at a speed of 1.2 Tb/s. This performance has been verified in AI tasks such as video semantic segmentation and object understanding, especially in indoor and aerial applications, where its speed is three to four orders of magnitude faster than existing technologies with similar spectral resolution. Through this innovative platform, researchers hope to open up new research directions for real - time AI video understanding and promote the development of future human - machine interaction and cognitive processing.

Artificial intelligence optical hardware empowers high-resolution hyperspectral video understanding at 1.2 Tb/s

Analog Optical Computing for Artificial Intelligence

Photonic real time video image signal processor at 17Tb/s based on a Kerr microcomb

Hypermultiplexed Integrated Tensor Optical Processor

Photonic optical accelerators: The future engine for the era of modern AI?

3D photonics for ultra-low energy, high bandwidth-density chip data links

1000x Faster Camera and Machine Vision with Ordinary Devices

Hypermultiplexed Integrated-Photonics-based Tensor Optical Processor

Ultrahigh Framerate Vision Chip Featuring Central-Based Edge Detection Processed by All-Digital In-Imager Global-Parallel Processing Architecture

Ultrafast dynamic machine vision with spatiotemporal photonic computing

1000× Faster Camera and Machine Vision with Ordinary Devices

11 TeraFLOPs per second photonic convolutional accelerator for deep learning optical neural networks

Parallel convolutional processing using an integrated photonic tensor core

Optical neuromorphic processing at Tera-OP/s speeds based on Kerr soliton crystal microcombs

Integrated Photonic Encoder for Terapixel Image Processing

Multiwavelength Neuromorphic Silicon Photonics

Optical training of large-scale Transformers and deep neural networks with direct feedback alignment

Intelligent Multi-channel Meta-imagers for Accelerating Machine Vision

Cognitive–behavioural therapy for schizophrenia

Inference in artificial intelligence with deep optics and photonics

High-throughput optical neural networks based on temporal computing