Ahmed Y. Radwan,Mohammad Shehab,Mohamed-Slim Alouini
Abstract:Natural Language Processing (NLP) operations, such as semantic sentiment analysis and text synthesis, may often impair users' privacy and demand significant on device computational resources. Centralized learning (CL) on the edge offers an alternative energy-efficient approach, yet requires the collection of raw information, which affects the user's privacy. While Federated learning (FL) preserves privacy, it requires high computational energy on board tiny user devices. We introduce split learning (SL) as an energy-efficient alternative, privacy-preserving tiny machine learning (TinyML) scheme and compare it to FL and CL in the presence of Rayleigh fading and additive noise. Our results show that SL reduces processing power and CO2 emissions while maintaining high accuracy, whereas FL offers a balanced compromise between efficiency and privacy. Hence, this study provides insights into deploying energy-efficient, privacy-preserving NLP models on edge devices.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to maintain high accuracy while ensuring privacy and reducing energy consumption when performing text sentiment classification on resource - constrained devices. Specifically, the paper focuses on the following aspects:
1. **Data Privacy**: Traditional centralized learning (CL) methods need to collect users' raw data, which may violate users' privacy. Although federated learning (FL) protects privacy, performing high - computational - load tasks on small devices will lead to excessive energy consumption. Therefore, the paper proposes a new method - split learning (SL) - to reduce the computational burden while protecting privacy.
2. **Computational Resources**: Large - language models (LLMs) such as the GPT series and BERT, etc., although they perform excellently in natural - language - processing tasks, they require a large amount of storage and computational resources. This is a challenge for resource - constrained edge devices (such as IoT devices). The paper uses model - compression techniques (such as quantization, pruning, and knowledge distillation) to reduce the model size and computational requirements, enabling these models to run on resource - constrained devices.
3. **Communication Efficiency**: Wireless data transmission (such as WiFi) is easily affected by noise, fading, bandwidth limitations, and unstable connections, which will affect communication efficiency. The paper explores the performance of different learning methods (CL, FL, and SL) under these adverse conditions and evaluates their performance in terms of accuracy, energy consumption, and carbon emissions.
4. **Energy Efficiency and Environmental Impact**: The paper not only focuses on the accuracy of the model but also considers the energy consumption and carbon emissions during the computing and communication processes. Through experimental comparison, the paper shows the advantages and disadvantages of different methods in this regard, especially the advantages of SL in terms of user - side computing energy consumption and carbon emissions.
In summary, the main objective of this paper is to explore a text - sentiment - classification method that is efficient, low - energy - consuming, and privacy - protecting on resource - constrained devices. By comparing the three methods of CL, FL, and SL, the paper provides a reference basis for choosing the appropriate scheme in practical applications.