Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis

Matteo Esposito,Francesco Palagiano,Valentina Lenarduzzi,Davide Taibi
2024-07-18
Abstract:Context. Risk analysis assesses potential risks in specific scenarios. Risk analysis principles are context-less; the same methodology can be applied to a risk connected to health and information technology security. Risk analysis requires a vast knowledge of national and international regulations and standards and is time and effort-intensive. A large language model can quickly summarize information in less time than a human and can be fine-tuned to specific tasks. Aim. Our empirical study aims to investigate the effectiveness of Retrieval-Augmented Generation and fine-tuned LLM in risk analysis. To our knowledge, no prior study has explored its capabilities in risk analysis. Method. We manually curated 193 unique scenarios leading to 1283 representative samples from over 50 mission-critical analyses archived by the industrial context team in the last five years. We compared the base GPT-3.5 and GPT-4 models versus their Retrieval-Augmented Generation and fine-tuned counterparts. We employ two human experts as competitors of the models and three other human experts to review the models and the former human experts' analysis. The reviewers analyzed 5,000 scenario analyses. Results and Conclusions. Human experts demonstrated higher accuracy, but LLMs are quicker and more actionable. Moreover, our findings show that RAG-assisted LLMs have the lowest hallucination rates, effectively uncovering hidden risks and complementing human expertise. Thus, the choice of model depends on specific needs, with FTMs for accuracy, RAG for hidden risks discovery, and base models for comprehensiveness and actionability. Therefore, experts can leverage LLMs as an effective complementing companion in risk analysis within a condensed timeframe. They can also save costs by averting unnecessary expenses associated with implementing unwarranted countermeasures.
Computation and Language,Artificial Intelligence,Cryptography and Security,Human-Computer Interaction
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the effectiveness of large - language models (LLMs) in Mission - Critical Risk Analysis (MCC - RA). Specifically, the research aims to explore the following points: 1. **Improving efficiency and accuracy**: By introducing Retrieval - Augmented Generation (RAG) and fine - tuning techniques, evaluate whether the accuracy and operability of LLMs in risk analysis can exceed traditional methods and can quickly process a large amount of information. 2. **Discovering hidden risks**: Research whether LLMs can effectively identify hidden risks that human experts may overlook, thereby providing more comprehensive support for risk management. 3. **Supplementing human expertise**: Explore how LLMs can be used as an auxiliary tool to help human experts complete risk analysis tasks more quickly while reducing the costs incurred by implementing unnecessary countermeasures. ### Research background and significance Risk analysis is an important part of the information security field, especially in health and information technology security. Traditional risk analysis methods require a great deal of time and expertise and rely on a deep understanding of national and international regulations and standards. With the development of large - language models, these models can quickly summarize information and adjust according to specific tasks, which may significantly improve the efficiency and quality of risk analysis. ### Research methods To verify the above problems, the authors carried out the following work: - **Data collection**: Carefully selected 193 unique scenarios from more than 50 mission - critical analyses in the past five years and generated 1,283 representative samples. - **Model comparison**: Compared the basic GPT - 3.5 and GPT - 4 models and their RAG - enhanced and fine - tuned versions. - **Human expert participation**: Invited two human experts to conduct risk analysis, and three other experts reviewed the output results of the models and human experts. ### Main conclusions - **Accuracy**: Human experts showed higher accuracy, but in terms of speed and operability, LLMs performed better. - **Hidden risk discovery**: RAG - assisted LLMs have the lowest hallucination rate and can effectively discover hidden risks and supplement the expertise of human experts. - **Application scenarios**: LLMs can choose different types of models according to specific needs (such as FTM for accuracy, RAG for discovering hidden risks), and as an effective auxiliary tool, help experts complete more comprehensive risk analysis in a shorter time and save costs. In conclusion, this study shows the potential of LLMs in mission - critical risk analysis, especially their advantages in improving efficiency, discovering hidden risks, and supplementing human expertise.