Scaling Supervised Local Learning with Augmented Auxiliary Networks

Chenxiang Ma,Jibin Wu,Chenyang Si,Kay Chen Tan
2024-02-27
Abstract:Deep neural networks are typically trained using global error signals that backpropagate (BP) end-to-end, which is not only biologically implausible but also suffers from the update locking problem and requires huge memory consumption. Local learning, which updates each layer independently with a gradient-isolated auxiliary network, offers a promising alternative to address the above problems. However, existing local learning methods are confronted with a large accuracy gap with the BP counterpart, particularly for large-scale networks. This is due to the weak coupling between local layers and their subsequent network layers, as there is no gradient communication across layers. To tackle this issue, we put forward an augmented local learning method, dubbed AugLocal. AugLocal constructs each hidden layer's auxiliary network by uniformly selecting a small subset of layers from its subsequent network layers to enhance their synergy. We also propose to linearly reduce the depth of auxiliary networks as the hidden layer goes deeper, ensuring sufficient network capacity while reducing the computational cost of auxiliary networks. Our extensive experiments on four image classification datasets (i.e., CIFAR-10, SVHN, STL-10, and ImageNet) demonstrate that AugLocal can effectively scale up to tens of local layers with a comparable accuracy to BP-trained networks while reducing GPU memory usage by around 40%. The proposed AugLocal method, therefore, opens up a myriad of opportunities for training high-performance deep neural networks on resource-constrained platforms.Code is available at
Neural and Evolutionary Computing,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of accuracy in existing supervised local learning methods when applied to large - scale networks, especially when the network has a large number of independently optimized layers. Specifically, due to the lack of feedback interaction between hidden layers, existing local learning methods can only learn representations suitable for their local goals, and cannot benefit from the information of subsequent layers like back - propagation (BP), resulting in a large accuracy gap compared with BP. In addition, local learning methods require a large amount of computing resources during the training process, especially in deep networks. To address these problems, the author proposes an enhanced local learning method - AugLocal. AugLocal strengthens the synergy between local layers and their subsequent layers by constructing an auxiliary network for each hidden layer. Specifically, AugLocal constructs an auxiliary network by uniformly selecting a small number of subsequent layers of the hidden layer, and proposes a pyramid structure to linearly reduce the depth of the auxiliary network as the hidden layer approaches the output layer, in order to reduce the computational cost. This method aims to improve the accuracy of local learning methods while reducing GPU memory usage, enabling it to efficiently train high - performance deep neural networks on resource - constrained platforms.