Learning Efficient Recursive Numeral Systems via Reinforcement Learning

Jonathan D. Thomas,Andrea Silvi,Devdatt Dubhashi,Emil Carlsson,Moa Johansson
2024-09-11
Abstract:The emergence of mathematical concepts, such as number systems, is an understudied area in AI for mathematics and reasoning. It has previously been shown Carlsson et al. (2021) that by using reinforcement learning (RL), agents can derive simple approximate and exact-restricted numeral systems. However, it is a major challenge to show how more complex recursive numeral systems, similar to the one utilised in English, could arise via a simple learning mechanism such as RL. Here, we introduce an approach towards deriving a mechanistic explanation of the emergence of recursive number systems where we consider an RL agent which directly optimizes a lexicon under a given meta-grammar. Utilising a slightly modified version of the seminal meta-grammar of Hurford (1975), we demonstrate that our RL agent can effectively modify the lexicon towards Pareto-optimal configurations which are comparable to those observed within human numeral systems.
Computation and Language
What problem does this paper attempt to address?
The problem this paper attempts to address is how to explain the formation and development of recursive numeral systems through reinforcement learning mechanisms. Specifically, the researchers aim to optimize the lexicon using reinforcement learning methods, thereby enabling these numeral systems to achieve efficient communication in the sense of information theory. The paper mentions that although previous research has demonstrated how to use reinforcement learning to derive simple approximate or precisely constrained numeral systems, the effectiveness of this simple mechanism is not evident for more complex recursive numeral systems. Therefore, this paper proposes a new approach, which considers a reinforcement learning agent that directly optimizes the lexicon under a given meta-grammar. By utilizing a slightly modified classic meta-grammar, the paper demonstrates that this method can effectively adjust the lexicon to approach the Pareto optimal configurations observed in human-used numeral systems. Additionally, by comparing with existing methods, it is proven that their proposed meta-grammar can better simulate the numeral system structures in natural languages.