Abstract:While small language models (SLMs) show promises for mobile deployment, their real-world performance and applications on smartphones remains underexplored. We present SlimLM, a series of SLMs optimized for document assistance tasks on mobile devices. Through extensive experiments on a Samsung Galaxy S24, we identify the optimal trade-offs between model size (ranging from 125M to 7B parameters), context length, and inference time for efficient on-device processing. SlimLM is pre-trained on SlimPajama-627B and fine-tuned on DocAssist, our constructed dataset for summarization, question answering and suggestion tasks. Our smallest model demonstrates efficient performance on S24, while larger variants offer enhanced capabilities within mobile constraints. We evaluate SlimLM against existing SLMs, showing comparable or superior performance and offering a benchmark for future research in on-device language models. We also provide an Android application, offering practical insights into SLM deployment. Our findings provide valuable insights and illuminate the capabilities of running advanced language models on high-end smartphones, potentially reducing server costs and enhancing privacy through on-device processing.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to explore the actual deployment performance and application potential of small - language models (SLMs) on mobile devices, especially their performance in document - assistance tasks. Specifically, by developing and optimizing a series of small - language models named SlimLM, the researchers aim to find the optimal balance among model size, context length, and inference time, in order to achieve efficient, low - latency document - processing capabilities that can run on mobile devices. The following are the key issues mentioned in the paper: 1. **Trade - off between model size and performance**: The researchers explored the performance of models of different sizes (from 125M to 7B parameters) on mobile devices, especially their efficiency and accuracy when processing long - context inputs. 2. **Impact of context length**: The researchers tested the impact of context inputs of different lengths on the model's inference speed and memory usage to determine the maximum context length that mobile devices can efficiently process. 3. **Inference time and memory limitations**: The researchers evaluated the performance of different models in terms of inference time and explored the memory limitations when running these models on mobile devices. 4. **Performance in document - assistance tasks**: The researchers constructed a special dataset named DocAssist for fine - tuning the models to improve their performance in document summarization, question suggestion, and question - answering tasks. To answer these questions, the researchers proposed the SlimLM series of models. These models, after pre - training and fine - tuning, can run efficiently on high - end smartphones such as the Samsung Galaxy S24. The experimental results show that the SlimLM models perform excellently on standard evaluation metrics (such as BLEU, ROUGE, semantic text similarity, etc.), and even outperform the existing small - language models in some tasks. In addition, the researchers also developed an Android application to demonstrate the document - assistance capabilities of SlimLM in practical application scenarios, which provides valuable references for further research and practical applications. Overall, this paper provides important insights and technical support for deploying efficient language models on mobile devices.

SlimLM: An Efficient Small Language Model for On-Device Document Assistance

PhoneLM:an Efficient and Capable Small Language Model Family through Principled Pre-training

ELMS: Elasticized Large Language Models On Mobile Devices

A Survey of Small Language Models

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Efficient and Personalized Mobile Health Event Prediction via Small Language Models

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Small Language Models: Survey, Measurements, and Insights

Small Language Models for Application Interactions: A Case Study

Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation

On-Device Language Models: A Comprehensive Review

A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices

MedMobile: A mobile-sized language model with expert-level clinical capabilities

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

Enabling On-Device LLMs Personalization with Smartphone Sensing

LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices

Imp: Highly Capable Large Multimodal Models for Mobile Devices

Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs

MELTing point: Mobile Evaluation of Language Transformers

Porting Large Language Models to Mobile Devices for Question Answering