Performance tests of LLMs in the context of answers on Industry 4.0

J. Luz Marina Santos,William Mauricio Rojas,D. Pedro José González,Ailín Orjuela Duarte
DOI: https://doi.org/10.1109/ColCACI63187.2024.10666552
2024-07-17
Abstract:Large Language Models (LLMs) are a type of artificial intelligence capable of processing and generating natural language. These models are trained on vast amounts of data, such as text and code, which enables them to perform various tasks like text generation, language translation, question answering, text summarization, etc. The purpose of this research was to find an LLM that meets the following requirements: 1)easy to implement with an understanding of the Spanish language, and 2)accurately answers diagnostic questions and action plans related to Industry 4.0 (I4.0). Three open-source LLMs were selected: Llama2, Mistral and Gemma, each one of them with 7B parameters and quantizations of Q2-Q4 for the application of a first set of tests. The results showed that the model with the best performance in the Spanish language was Mistral, achieving an accuracy rate of 82.3% compared to Gemma’s 51.7%. A second set of tests was then conducted with and without Retrieval-Augmented Generation (RAG), using three documents related to the topic of I4.0. The results demonstrated Mistral’s capability with RAG to answer questions on the studied context with 95% accuracy.
Computer Science
What problem does this paper attempt to address?