PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context

Maximilian Augustin,Syed Shakib Sarwar,Mostafa Elhoushi,Sai Qian Zhang,Yuecheng Li,Barbara De Salvo
2024-10-23
Abstract:Following their success in natural language processing (NLP), there has been a shift towards transformer models in computer vision. While transformers perform well and offer promising multi-tasking performance, due to their high compute requirements, many resource-constrained applications still rely on convolutional or hybrid models that combine the benefits of convolution and attention layers and achieve the best results in the sub 100M parameter range. Simultaneously, task adaptation techniques that allow for the use of one shared transformer backbone for multiple downstream tasks, resulting in great storage savings at negligible cost in performance, have not yet been adopted for hybrid transformers. In this work, we investigate how to achieve the best task-adaptation performance and introduce PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers. We further combine PETAH adaptation with pruning to achieve highly performant and storage friendly models for multi-tasking. In our extensive evaluation on classification and other vision tasks, we demonstrate that our PETAH-adapted hybrid models outperform established task-adaptation techniques for ViTs while requiring fewer parameters and being more efficient on mobile hardware.
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to efficiently perform task adaptation on the hybrid Transformer architecture in a resource - constrained environment to achieve the optimal balance between parameter efficiency and performance. Specifically: 1. **Existing problems**: - Although Transformer models perform excellently in the fields of natural language processing (NLP) and computer vision, their computational requirements are high, and many resource - constrained applications still rely on convolutional neural networks or hybrid models that combine convolutional layers and attention mechanisms. - Current task adaptation techniques mainly focus on pure Transformer models and are not specifically optimized for the hybrid Transformer architecture. 2. **Research objectives**: - Explore how to achieve the best task adaptation performance in the hybrid Transformer architecture. - Introduce the PETAH (Parameter Efficient Task Adaptation for Hybrid Transformers) framework, which simultaneously adjusts the fully - connected layers and convolutional layers through the low - rank adaptation method, thereby achieving a better balance between parameter efficiency and performance. - Combine pruning techniques to further optimize the storage and computational efficiency of the model, making it more suitable for resource - constrained environments such as multi - task processing and mobile devices. 3. **Specific problems**: - How can the performance of the hybrid Transformer model on different downstream tasks be improved without significantly increasing the number of parameters? - Can the convolutional layers in the hybrid model be effectively adjusted by the low - rank adaptation method to improve the flexibility and performance of task adaptation? - In a resource - constrained environment, how can the efficiency and storage - friendliness of the model be ensured? Through the discussion of these problems, the paper aims to provide a new and efficient task adaptation method for the hybrid Transformer architecture, enabling it to be better applied to various computer vision tasks in the case of limited resources.