SlimLM: An Efficient Small Language Model for On-Device Document Assistance

Thang M. Pham,Phat T. Nguyen,Seunghyun Yoon,Viet Dac Lai,Franck Dernoncourt,Trung Bui
2024-11-15
Abstract:While small language models (SLMs) show promises for mobile deployment, their real-world performance and applications on smartphones remains underexplored. We present SlimLM, a series of SLMs optimized for document assistance tasks on mobile devices. Through extensive experiments on a Samsung Galaxy S24, we identify the optimal trade-offs between model size (ranging from 125M to 7B parameters), context length, and inference time for efficient on-device processing. SlimLM is pre-trained on SlimPajama-627B and fine-tuned on DocAssist, our constructed dataset for summarization, question answering and suggestion tasks. Our smallest model demonstrates efficient performance on S24, while larger variants offer enhanced capabilities within mobile constraints. We evaluate SlimLM against existing SLMs, showing comparable or superior performance and offering a benchmark for future research in on-device language models. We also provide an Android application, offering practical insights into SLM deployment. Our findings provide valuable insights and illuminate the capabilities of running advanced language models on high-end smartphones, potentially reducing server costs and enhancing privacy through on-device processing.
Computation and Language
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to explore the actual deployment performance and application potential of small - language models (SLMs) on mobile devices, especially their performance in document - assistance tasks. Specifically, by developing and optimizing a series of small - language models named SlimLM, the researchers aim to find the optimal balance among model size, context length, and inference time, in order to achieve efficient, low - latency document - processing capabilities that can run on mobile devices. The following are the key issues mentioned in the paper: 1. **Trade - off between model size and performance**: The researchers explored the performance of models of different sizes (from 125M to 7B parameters) on mobile devices, especially their efficiency and accuracy when processing long - context inputs. 2. **Impact of context length**: The researchers tested the impact of context inputs of different lengths on the model's inference speed and memory usage to determine the maximum context length that mobile devices can efficiently process. 3. **Inference time and memory limitations**: The researchers evaluated the performance of different models in terms of inference time and explored the memory limitations when running these models on mobile devices. 4. **Performance in document - assistance tasks**: The researchers constructed a special dataset named DocAssist for fine - tuning the models to improve their performance in document summarization, question suggestion, and question - answering tasks. To answer these questions, the researchers proposed the SlimLM series of models. These models, after pre - training and fine - tuning, can run efficiently on high - end smartphones such as the Samsung Galaxy S24. The experimental results show that the SlimLM models perform excellently on standard evaluation metrics (such as BLEU, ROUGE, semantic text similarity, etc.), and even outperform the existing small - language models in some tasks. In addition, the researchers also developed an Android application to demonstrate the document - assistance capabilities of SlimLM in practical application scenarios, which provides valuable references for further research and practical applications. Overall, this paper provides important insights and technical support for deploying efficient language models on mobile devices.