Do LLMs Dream of Ontologies?

Marco Bombieri,Paolo Fiorini,Simone Paolo Ponzetto,Marco Rospocher
2024-01-26
Abstract:Large language models (LLMs) have recently revolutionized automated text understanding and generation. The performance of these models relies on the high number of parameters of the underlying neural architectures, which allows LLMs to memorize part of the vast quantity of data seen during the training. This paper investigates whether and to what extent general-purpose pre-trained LLMs have memorized information from known ontologies. Our results show that LLMs partially know ontologies: they can, and do indeed, memorize concepts from ontologies mentioned in the text, but the level of memorization of their concepts seems to vary proportionally to their popularity on the Web, the primary source of their training material. We additionally propose new metrics to estimate the degree of memorization of ontological information in LLMs by measuring the consistency of the output produced across different prompt repetitions, query languages, and degrees of determinism.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper explores whether and to what extent large language models (LLMs) remember the information from a known domain. The research found that LLMs do partially remember the concepts from the mentioned domain in the text, but the degree of memorization seems to be proportional to the popularity of the concepts on the web, suggesting that their main training material - web text - serves as the source of information. The paper also proposes new metrics to estimate the degree of domain information memorization in LLMs, by measuring the output consistency under different prompts repetition, query language, and determinism. The findings align with the working principles of human memory, implying that LLMs' memory of certain domains also depends on the frequency of encountering the information in the training material.