Abstract:Optical and hybrid convolutional neural networks (CNNs) recently have become of increasing interest to achieve low-latency, low-power image classification and computer vision tasks. However, implementing optical nonlinearity is challenging, and omitting the nonlinear layers in a standard CNN comes at a significant reduction in accuracy. In this work, we use knowledge distillation to compress modified AlexNet to a single linear convolutional layer and an electronic backend (two fully connected layers). We obtain comparable performance to a purely electronic CNN with five convolutional layers and three fully connected layers. We implement the convolution optically via engineering the point spread function of an inverse-designed meta-optic. Using this hybrid approach, we estimate a reduction in multiply-accumulate operations from 17M in a conventional electronic modified AlexNet to only 86K in the hybrid compressed network enabled by the optical frontend. This constitutes over two orders of magnitude reduction in latency and power consumption. Furthermore, we experimentally demonstrate that the classification accuracy of the system exceeds 93% on the MNIST dataset.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: how to achieve low - latency and low - power - consumption image classification tasks through a hybrid optical - and - electronic method while maintaining high classification accuracy. Specifically, the authors are concerned with the problem that the convolution operation in convolutional neural networks (CNNs) is very computationally expensive, resulting in high latency and high power consumption. To address this challenge, they proposed a Compressed Meta - Optical Encoder, which uses knowledge distillation technology to compress multiple convolutional layers into a single linear convolutional layer and combines an electronic back - end to achieve efficient image classification. ### Main Problems and Solutions 1. **High Computational Cost of Convolution Operations** - **Problem**: In traditional CNNs, convolution operations account for most of the running time and power consumption, especially when processing high - resolution images. - **Solution**: By designing inversely - optimized meta - optics, the convolution operation is implemented at the optical front - end. This can significantly reduce the number of multiply - accumulate (MAC) operations, thereby reducing latency and power consumption. 2. **Difficulties in Implementing Non - linear Layers** - **Problem**: It is difficult for optical systems to implement non - linear activation functions (such as ReLU), and these non - linear layers are crucial to the performance of CNNs. - **Solution**: Use knowledge distillation technology to remove non - linear layers and compress a complex CNN model (such as AlexNet) into a simple linear convolutional layer and a fully - connected layer. This method can avoid the implementation difficulties of non - linear layers without significantly degrading performance. 3. **Compactness and Robustness of the System** - **Problem**: Traditional 4f systems are large in volume and prone to alignment errors. - **Solution**: Use a single - layer meta - optical device for convolution operations, which simplifies the experimental setup and improves the compactness and robustness of the system. ### Experimental Results Through the above methods, the authors achieved the following results: - **Performance Comparison** - AlexNet - Mod (original model): The accuracy rate of the training set is 98.9%, and the accuracy rate of the test set is 98.4%. - Compressed electronic CNN (without knowledge distillation): The accuracy rate of the training set is 84.2%, and the accuracy rate of the test set is 82.1%. - Compressed electronic CNN (with knowledge distillation): The accuracy rate of the training set is 97.2%, and the accuracy rate of the test set is 96.2%. - Hybrid optical - electronic CNN (with knowledge distillation): The accuracy rate of the training set is 93.9%, and the accuracy rate of the test set is 93.4%. - **Computational Complexity** - AlexNet - Mod requires 17,268,224 MAC operations. - The compressed electronic CNN requires 228,672 MAC operations. - The hybrid optical - electronic CNN only requires 85,824 MAC operations, which is reduced by two orders of magnitude. ### Conclusion This research presents an innovative hybrid optical - electronic CNN architecture. By compressing convolutional layers and introducing an optical front - end, it significantly reduces computational complexity and power consumption while maintaining high classification accuracy. This provides a new solution for future efficient and low - power - consumption image classification tasks.

Compressed Meta-Optical Encoder for Image Classification

Optical Convolution Based Computational Method for Low-Power Image Processing

Optical-electronic Hybrid Fourier Convolutional Neural Network Based on Super-Pixel Complex-Valued Modulation

An Optical Frontend for a Convolutional Neural Network

Transferable polychromatic optical encoder for neural networks

Meta-optic Accelerators for Object Classifiers

Multi-layer Optical Convolutional Neural Network with Nonlinear Activation

Resource-Saving and High-Robustness Image Sensing Based on Binary Optical Computing

11 TeraFLOPs per second photonic convolutional accelerator for deep learning optical neural networks

Low-power scalable multilayer optoelectronic neural networks enabled with incoherent light

Image sensing with multilayer nonlinear optical neural networks

Hybrid optical convolutional neural network with convolution kernels trained in the spatial domain

On-chip 4F-System Based on Concave Mirrors for Optical Neural Networks

End-to-End Optimization for a Compact Optical Neural Network Based on Nanostructured 2 x 2 Optical Processors

Multilayer Optoelectronic Hybrid Convolutional Neural Network with an Optical 4F-System Recurrent Structure

Sophisticated Deep Learning with On-Chip Optical Diffractive Tensor Processing

Single-shot optical neural network

Spatially Varying Nanophotonic Neural Networks

Multichannel meta-imagers for accelerating machine vision

An optical neural network using less than 1 photon per multiplication

Knowledge Distillation Circumvents Nonlinearity for Optical Convolutional Neural Networks