Compressed Meta-Optical Encoder for Image Classification

Anna Wirth-Singh,Jinlin Xiang,Minho Choi,Johannes E. Fröch,Luocheng Huang,Shane Colburn,Eli Shlizerman,Arka Majumdar
2024-06-14
Abstract:Optical and hybrid convolutional neural networks (CNNs) recently have become of increasing interest to achieve low-latency, low-power image classification and computer vision tasks. However, implementing optical nonlinearity is challenging, and omitting the nonlinear layers in a standard CNN comes at a significant reduction in accuracy. In this work, we use knowledge distillation to compress modified AlexNet to a single linear convolutional layer and an electronic backend (two fully connected layers). We obtain comparable performance to a purely electronic CNN with five convolutional layers and three fully connected layers. We implement the convolution optically via engineering the point spread function of an inverse-designed meta-optic. Using this hybrid approach, we estimate a reduction in multiply-accumulate operations from 17M in a conventional electronic modified AlexNet to only 86K in the hybrid compressed network enabled by the optical frontend. This constitutes over two orders of magnitude reduction in latency and power consumption. Furthermore, we experimentally demonstrate that the classification accuracy of the system exceeds 93% on the MNIST dataset.
Computer Vision and Pattern Recognition,Image and Video Processing,Optics
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: how to achieve low - latency and low - power - consumption image classification tasks through a hybrid optical - and - electronic method while maintaining high classification accuracy. Specifically, the authors are concerned with the problem that the convolution operation in convolutional neural networks (CNNs) is very computationally expensive, resulting in high latency and high power consumption. To address this challenge, they proposed a Compressed Meta - Optical Encoder, which uses knowledge distillation technology to compress multiple convolutional layers into a single linear convolutional layer and combines an electronic back - end to achieve efficient image classification. ### Main Problems and Solutions 1. **High Computational Cost of Convolution Operations** - **Problem**: In traditional CNNs, convolution operations account for most of the running time and power consumption, especially when processing high - resolution images. - **Solution**: By designing inversely - optimized meta - optics, the convolution operation is implemented at the optical front - end. This can significantly reduce the number of multiply - accumulate (MAC) operations, thereby reducing latency and power consumption. 2. **Difficulties in Implementing Non - linear Layers** - **Problem**: It is difficult for optical systems to implement non - linear activation functions (such as ReLU), and these non - linear layers are crucial to the performance of CNNs. - **Solution**: Use knowledge distillation technology to remove non - linear layers and compress a complex CNN model (such as AlexNet) into a simple linear convolutional layer and a fully - connected layer. This method can avoid the implementation difficulties of non - linear layers without significantly degrading performance. 3. **Compactness and Robustness of the System** - **Problem**: Traditional 4f systems are large in volume and prone to alignment errors. - **Solution**: Use a single - layer meta - optical device for convolution operations, which simplifies the experimental setup and improves the compactness and robustness of the system. ### Experimental Results Through the above methods, the authors achieved the following results: - **Performance Comparison** - AlexNet - Mod (original model): The accuracy rate of the training set is 98.9%, and the accuracy rate of the test set is 98.4%. - Compressed electronic CNN (without knowledge distillation): The accuracy rate of the training set is 84.2%, and the accuracy rate of the test set is 82.1%. - Compressed electronic CNN (with knowledge distillation): The accuracy rate of the training set is 97.2%, and the accuracy rate of the test set is 96.2%. - Hybrid optical - electronic CNN (with knowledge distillation): The accuracy rate of the training set is 93.9%, and the accuracy rate of the test set is 93.4%. - **Computational Complexity** - AlexNet - Mod requires 17,268,224 MAC operations. - The compressed electronic CNN requires 228,672 MAC operations. - The hybrid optical - electronic CNN only requires 85,824 MAC operations, which is reduced by two orders of magnitude. ### Conclusion This research presents an innovative hybrid optical - electronic CNN architecture. By compressing convolutional layers and introducing an optical front - end, it significantly reduces computational complexity and power consumption while maintaining high classification accuracy. This provides a new solution for future efficient and low - power - consumption image classification tasks.