Investigating Symbolic Capabilities of Large Language Models

Neisarg Dave,Daniel Kifer,C. Lee Giles,Ankur Mali

2024-05-22

Abstract:Prompting techniques have significantly enhanced the capabilities of Large Language Models (LLMs) across various complex tasks, including reasoning, planning, and solving math word problems. However, most research has predominantly focused on language-based reasoning and word problems, often overlooking the potential of LLMs in handling symbol-based calculations and reasoning. This study aims to bridge this gap by rigorously evaluating LLMs on a series of symbolic tasks, such as addition, multiplication, modulus arithmetic, numerical precision, and symbolic counting. Our analysis encompasses eight LLMs, including four enterprise-grade and four open-source models, of which three have been pre-trained on mathematical tasks. The assessment framework is anchored in Chomsky's Hierarchy, providing a robust measure of the computational abilities of these models. The evaluation employs minimally explained prompts alongside the zero-shot Chain of Thoughts technique, allowing models to navigate the solution process autonomously. The findings reveal a significant decline in LLMs' performance on context-free and context-sensitive symbolic tasks as the complexity, represented by the number of symbols, increases. Notably, even the fine-tuned GPT3.5 exhibits only marginal improvements, mirroring the performance trends observed in other models. Across the board, all models demonstrated a limited generalization ability on these symbol-intensive tasks. This research underscores LLMs' challenges with increasing symbolic complexity and highlights the need for specialized training, memory and architectural adjustments to enhance their proficiency in symbol-based reasoning tasks.

Computation and Language,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the capabilities of large - language models (LLMs) in handling symbolic tasks. Specifically, the researchers focus on the performance of these models in symbolic tasks such as addition, multiplication, modulo operation, numerical precision, and symbol counting. The paper points out that although most of the existing research mainly focuses on language reasoning and word problems, few have explored the potential of LLMs in handling symbol - based calculation and reasoning. Therefore, this article aims to fill this gap through a series of rigorous experiments to evaluate the performance of different LLMs on these tasks and explore their performance when facing an increase in symbolic complexity. The research adopted eight different LLMs, including four enterprise - level models and four open - source models, among which three models have been pre - trained for math tasks. The evaluation framework is based on the Chomsky hierarchy and provides a robust method for measuring the computational capabilities of these models. In the experiments, minimally - explained prompts and zero - shot chain - of - thought techniques were used to enable the models to autonomously navigate the solution process. The research results show that as the task complexity increases, especially the number of symbols, the performance of LLMs on context - free and context - sensitive symbolic tasks decreases significantly. Even the fine - tuned GPT3.5 model only shows marginal improvement, which reflects the performance trends observed in other models. Overall, all models have limited generalization ability in such symbol - intensive tasks. In addition, the research also emphasizes the challenges of LLMs when facing increasingly complex symbolic tasks and points out that in order to improve their proficiency in symbol - based reasoning tasks, special training, memory, and architecture adjustments are required.

Investigating Symbolic Capabilities of Large Language Models

Can Large Language Models Understand Symbolic Graphics Programs?

Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models

Reasoning in Large Language Models Through Symbolic Math Word Problems

Large Language Models Are Neurosymbolic Reasoners

Can Large Language Models Act as Symbolic Reasoners?

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language Conversion for Language Models

Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning

Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From A Psychological Perspective

MathPrompter: Mathematical Reasoning using Large Language Models

Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology

Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models

Large Language Models are Interpretable Learners

A Closer Look at Logical Reasoning with LLMs: The Choice of Tool Matters

Large Language Models for Mathematical Reasoning: Progresses and Challenges

Large Language Models Are In-Context Semantic Reasoners Rather Than Symbolic Reasoners

LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs

Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems

Interpreting and Improving Large Language Models in Arithmetic Calculation