LaCoOT: Layer Collapse through Optimal Transport

Victor Quétu,Nour Hezbri,Enzo Tartaglione
2024-06-13
Abstract:Although deep neural networks are well-known for their remarkable performance in tackling complex tasks, their hunger for computational resources remains a significant hurdle, posing energy-consumption issues and restricting their deployment on resource-constrained devices, which stalls their widespread adoption. In this paper, we present an optimal transport method to reduce the depth of over-parametrized deep neural networks, alleviating their computational burden. More specifically, we propose a new regularization strategy based on the Max-Sliced Wasserstein distance to minimize the distance between the intermediate feature distributions in the neural network. We show that minimizing this distance enables the complete removal of intermediate layers in the network, with almost no performance loss and without requiring any finetuning. We assess the effectiveness of our method on traditional image classification setups. We commit to releasing the source code upon acceptance of the article.
Machine Learning
What problem does this paper attempt to address?
This paper proposes a method called LACOOT (Layer Collapse Through Optimal Transport) aimed at addressing the high computational demand and energy consumption problem of deep neural networks (DNNs). Although DNNs perform well in handling complex tasks, their computation requirements limit their applications on resource-limited devices. LACOOT utilizes optimal transport theory to reduce the depth of over-parameterized DNNs, thereby alleviating their computational burden. Specifically, the paper introduces a regularization strategy based on maximum sliced Wasserstein distance to minimize the distance between intermediate feature distributions in neural networks. By minimizing this distance, intermediate layers in the network can be completely removed with little performance loss and without the need for fine-tuning. This method is evaluated on traditional image classification tasks and the code will be open-sourced after the paper is accepted. Compared to traditional parameter pruning methods, LACOOT focuses more on reducing the network depth, while most existing methods are often less efficient or unable to directly remove redundant layers while maintaining performance. In addition, LACOOT operates internally within the model instead of training multiple networks, using optimal transport tools to quantify and control learning redundancy, enabling the network to identify and remove the least contributing blocks. The paper also investigates the application of Wasserstein distance and its sliced version in deep compression strategies, pointing out that it can effectively quantify distribution differences and avoid the need for pre-determining the number of layers to be pruned or relying on ranking-based criteria. LACOOT induces layer collapsing during the training process, collapsing multiple layers at once, improving efficiency. The experimental section demonstrates the effectiveness of LACOOT on various architectures and datasets, indicating that the method can significantly reduce the computational demand of models while maintaining comparable performance.