Uncertainty Quantification in Large Language Models Through Convex Hull Analysis

Ferhat Ozgur Catak,Murat Kuzlu
2024-06-28
Abstract:Uncertainty quantification approaches have been more critical in large language models (LLMs), particularly high-risk applications requiring reliable outputs. However, traditional methods for uncertainty quantification, such as probabilistic models and ensemble techniques, face challenges when applied to the complex and high-dimensional nature of LLM-generated outputs. This study proposes a novel geometric approach to uncertainty quantification using convex hull analysis. The proposed method leverages the spatial properties of response embeddings to measure the dispersion and variability of model outputs. The prompts are categorized into three types, i.e., `easy', `moderate', and `confusing', to generate multiple responses using different LLMs at varying temperature settings. The responses are transformed into high-dimensional embeddings via a BERT model and subsequently projected into a two-dimensional space using Principal Component Analysis (PCA). The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is utilized to cluster the embeddings and compute the convex hull for each selected cluster. The experimental results indicate that the uncertainty of the model for LLMs depends on the prompt complexity, the model, and the temperature setting.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to quantify uncertainty in large - language models (LLMs). Specifically, traditional uncertainty - quantification methods, such as probability models and ensemble techniques, face challenges when applied to the complex and high - dimensional outputs generated by LLMs. This paper proposes a new geometric method based on convex - hull analysis to quantify uncertainty. This method utilizes the spatial characteristics of response embeddings to measure the dispersion and variability of model outputs. By dividing prompts into three categories: "easy", "medium" and "confusing", and using different LLMs to generate multiple responses at different temperature settings, then converting these responses into high - dimensional embeddings and projecting them onto a two - dimensional space via principal component analysis (PCA), and then using the density - based spatial clustering of applications with noise (DBSCAN) algorithm to cluster the embeddings and calculate the convex hull of each selected cluster, in order to evaluate the degree of uncertainty. Experimental results show that the uncertainty of LLMs depends on the complexity of the prompt, the model type and the temperature setting.