Abstract:Context. Risk analysis assesses potential risks in specific scenarios. Risk analysis principles are context-less; the same methodology can be applied to a risk connected to health and information technology security. Risk analysis requires a vast knowledge of national and international regulations and standards and is time and effort-intensive. A large language model can quickly summarize information in less time than a human and can be fine-tuned to specific tasks. Aim. Our empirical study aims to investigate the effectiveness of Retrieval-Augmented Generation and fine-tuned LLM in risk analysis. To our knowledge, no prior study has explored its capabilities in risk analysis. Method. We manually curated 193 unique scenarios leading to 1283 representative samples from over 50 mission-critical analyses archived by the industrial context team in the last five years. We compared the base GPT-3.5 and GPT-4 models versus their Retrieval-Augmented Generation and fine-tuned counterparts. We employ two human experts as competitors of the models and three other human experts to review the models and the former human experts' analysis. The reviewers analyzed 5,000 scenario analyses. Results and Conclusions. Human experts demonstrated higher accuracy, but LLMs are quicker and more actionable. Moreover, our findings show that RAG-assisted LLMs have the lowest hallucination rates, effectively uncovering hidden risks and complementing human expertise. Thus, the choice of model depends on specific needs, with FTMs for accuracy, RAG for hidden risks discovery, and base models for comprehensiveness and actionability. Therefore, experts can leverage LLMs as an effective complementing companion in risk analysis within a condensed timeframe. They can also save costs by averting unnecessary expenses associated with implementing unwarranted countermeasures.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the effectiveness of large - language models (LLMs) in Mission - Critical Risk Analysis (MCC - RA). Specifically, the research aims to explore the following points: 1. **Improving efficiency and accuracy**: By introducing Retrieval - Augmented Generation (RAG) and fine - tuning techniques, evaluate whether the accuracy and operability of LLMs in risk analysis can exceed traditional methods and can quickly process a large amount of information. 2. **Discovering hidden risks**: Research whether LLMs can effectively identify hidden risks that human experts may overlook, thereby providing more comprehensive support for risk management. 3. **Supplementing human expertise**: Explore how LLMs can be used as an auxiliary tool to help human experts complete risk analysis tasks more quickly while reducing the costs incurred by implementing unnecessary countermeasures. ### Research background and significance Risk analysis is an important part of the information security field, especially in health and information technology security. Traditional risk analysis methods require a great deal of time and expertise and rely on a deep understanding of national and international regulations and standards. With the development of large - language models, these models can quickly summarize information and adjust according to specific tasks, which may significantly improve the efficiency and quality of risk analysis. ### Research methods To verify the above problems, the authors carried out the following work: - **Data collection**: Carefully selected 193 unique scenarios from more than 50 mission - critical analyses in the past five years and generated 1,283 representative samples. - **Model comparison**: Compared the basic GPT - 3.5 and GPT - 4 models and their RAG - enhanced and fine - tuned versions. - **Human expert participation**: Invited two human experts to conduct risk analysis, and three other experts reviewed the output results of the models and human experts. ### Main conclusions - **Accuracy**: Human experts showed higher accuracy, but in terms of speed and operability, LLMs performed better. - **Hidden risk discovery**: RAG - assisted LLMs have the lowest hallucination rate and can effectively discover hidden risks and supplement the expertise of human experts. - **Application scenarios**: LLMs can choose different types of models according to specific needs (such as FTM for accuracy, RAG for discovering hidden risks), and as an effective auxiliary tool, help experts complete more comprehensive risk analysis in a shorter time and save costs. In conclusion, this study shows the potential of LLMs in mission - critical risk analysis, especially their advantages in improving efficiency, discovering hidden risks, and supplementing human expertise.

Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis

Walking a Tightrope -- Evaluating Large Language Models in High-Risk Domains

Leveraging Large Language Models for Preliminary Security Risk Analysis: A Mission-Critical Case Study

Risk Assessment of Large Language Models Beyond Apocalyptic Visions

How good are large language models at product risk assessment?

On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions

Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference

Risk and Response in Large Language Models: Evaluating Key Threat Categories

Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility

A Formalism and Approach for Improving Robustness of Large Language Models Using Risk-Adjusted Confidence Scores

Enhance FMEA with Large Language Models for Assisted Risk Management in Technical Processes and Products

Perils and opportunities in using large language models in psychological research

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

On Large Language Models in National Security Applications

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

The Use of Large Language Models (LLM) for Cyber Threat Intelligence (CTI) in Cybercrime Forums

Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play

Improving accuracy of GPT-3/4 results on biomedical data using a retrieval-augmented language model

On Large Language Models in Mission-Critical IT Governance: Are We Ready Yet?

Advancing TTP Analysis: Harnessing the Power of Large Language Models with Retrieval Augmented Generation

RiskLabs: Predicting Financial Risk Using Large Language Model Based on Multi-Sources Data