Generalization potential of large language models

Mikhail Budnikov,Anna Bykova,Ivan P. Yamshchikov
DOI: https://doi.org/10.1007/s00521-024-10827-6
2024-12-18
Neural Computing and Applications
Abstract:The rise of deep learning techniques and especially the advent of large language models (LLMs) intensified the discussions around possibilities that artificial intelligence with higher generalization capability entails. The range of opinions on the capabilities of LLMs is extremely broad: from equating language models with stochastic parrots to stating that they are already conscious. This paper represents an attempt to review LLM landscape in the context of their generalization capacity as an information theoretic property of those complex systems. We discuss the suggested theoretical explanations for generalization in LLMs and highlight possible mechanisms responsible for these generalization properties. Through an examination of existing literature and theoretical frameworks, we endeavor to provide insights into the mechanisms driving the generalization capacity of LLMs, thus contributing to a deeper understanding of their capabilities and limitations in natural language processing tasks.
computer science, artificial intelligence
What problem does this paper attempt to address?