Abstract:Large language models (LLM) have demonstrated emergent abilities in text generation, question answering, and reasoning, facilitating various tasks and domains. Despite their proficiency in various tasks, LLMs like LaPM 540B and Llama-3.1 405B face limitations due to large parameter sizes and computational demands, often requiring cloud API use which raises privacy concerns, limits real-time applications on edge devices, and increases fine-tuning costs. Additionally, LLMs often underperform in specialized domains such as healthcare and law due to insufficient domain-specific knowledge, necessitating specialized models. Therefore, Small Language Models (SLMs) are increasingly favored for their low inference latency, cost-effectiveness, efficient development, and easy customization and adaptability. These models are particularly well-suited for resource-limited environments and domain knowledge acquisition, addressing LLMs' challenges and proving ideal for applications that require localized data handling for privacy, minimal inference latency for efficiency, and domain knowledge acquisition through lightweight fine-tuning. The rising demand for SLMs has spurred extensive research and development. However, a comprehensive survey investigating issues related to the definition, acquisition, application, enhancement, and reliability of SLM remains lacking, prompting us to conduct a detailed survey on these topics. The definition of SLMs varies widely, thus to standardize, we propose defining SLMs by their capability to perform specialized tasks and suitability for resource-constrained settings, setting boundaries based on the minimal size for emergent abilities and the maximum size sustainable under resource constraints. For other aspects, we provide a taxonomy of relevant models/methods and develop general frameworks for each category to enhance and utilize SLMs effectively.

Empowering Large Language Models to Edge Intelligence: A Survey of Edge Efficient LLMs and Techniques

A Review on Edge Large Language Models: Design, Execution, and Applications

Mobile Edge Intelligence for Large Language Models: A Contemporary Survey

Towards Edge General Intelligence via Large Language Models: Opportunities and Challenges

On-Device Language Models: A Comprehensive Review

Large Language Models Empowered Autonomous Edge AI for Connected Intelligence

Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches

An Empirical Analysis and Resource Footprint Study of Deploying Large Language Models on Edge Devices

Efficient Large Language Models: A Survey

The Efficiency Spectrum of Large Language Models: An Algorithmic Survey

A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness

EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

Pushing Large Language Models to the 6G Edge: Vision, Challenges, and Opportunities

Small Language Models: Survey, Measurements, and Insights

A Survey on Efficient Inference for Large Language Models

Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models