Using General Large Language Models to Classify Mathematical Documents

Patrick D.F. Ion,Stephen M. Watt
2024-06-12
Abstract:In this article we report on an initial exploration to assess the viability of using the general large language models (LLMs), recently made public, to classify mathematical documents. Automated classification would be useful from the applied perspective of improving the navigation of the literature and the more open-ended goal of identifying relations among mathematical results. The Mathematical Subject Classification MSC 2020, from MathSciNet and zbMATH, is widely used and there is a significant corpus of ground truth material in the open literature. We have evaluated the classification of preprint articles from <a class="link-external link-http" href="http://arXiv.org" rel="external noopener nofollow">arXiv.org</a> according to MSC 2020. The experiment used only the title and abstract alone -- not the entire paper. Since this was early in the use of chatbots and the development of their APIs, we report here on what was carried out by hand. Of course, the automation of the process will have to follow if it is to be generally useful. We found that in about 60% of our sample the LLM produced a primary classification matching that already reported on arXiv. In about half of those instances, there were additional primary classifications that were not detected. In about 40% of our sample, the LLM suggested a different classification than what was provided. A detailed examination of these cases, however, showed that the LLM-suggested classifications were in most cases better than those provided.
Information Retrieval,Computation and Language,Digital Libraries
What problem does this paper attempt to address?
This paper discusses the feasibility of using Large Language Models (LLMs) to classify mathematical documents. The researchers used tools like ChatGPT to generate classification suggestions based on input paper titles and abstracts, and compared them with the Mathematics Subject Classification (MSC 2020) available on arXiv. Preliminary experiments showed that in approximately 60% of the samples, the primary classification provided by the LLMs matched the classifications on arXiv, while in approximately 40% of the samples, the LLMs proposed different classifications. However, further analysis suggested that these differences were often more accurate or appropriate classifications in many cases. The paper introduces the potential of using LLMs for automated classification, especially in improving literature navigation and identifying relationships between mathematical results. Despite some mismatches, the researchers believe that the classifications proposed by LLMs are superior to manually provided classifications in many cases. Future research directions may include improving methods to enhance the accuracy and reliability of automated classification.