Abstract:The recent popularity of large language models (LLMs) has brought a significant impact to boundless fields, particularly through their open-ended ecosystem such as the APIs, open-sourced models, and plugins. However, with their widespread deployment, there is a general lack of research that thoroughly discusses and analyzes the potential risks concealed. In that case, we intend to conduct a preliminary but pioneering study covering the robustness, consistency, and credibility of LLMs systems. With most of the related literature in the era of LLM uncharted, we propose an automated workflow that copes with an upscaled number of queries/responses. Overall, we conduct over a million queries to the mainstream LLMs including ChatGPT, LLaMA, and OPT. Core to our workflow consists of a data primitive, followed by an automated interpreter that evaluates these LLMs under different adversarial metrical systems. As a result, we draw several, and perhaps unfortunate, conclusions that are quite uncommon from this trendy community. Briefly, they are: (i)-the minor but inevitable error occurrence in the user-generated query input may, by chance, cause the LLM to respond unexpectedly; (ii)-LLMs possess poor consistency when processing semantically similar query input. In addition, as a side finding, we find that ChatGPT is still capable to yield the correct answer even when the input is polluted at an extreme level. While this phenomenon demonstrates the powerful memorization of the LLMs, it raises serious concerns about using such data for LLM-involved evaluation in academic development. To deal with it, we propose a novel index associated with a dataset that roughly decides the feasibility of using such data for LLM-involved evaluation. Extensive empirical studies are tagged to support the aforementioned claims.

Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method

Do Large Language Models Know What They Don't Know?

Are LLMs Really Not Knowledgable? Mining the Submerged Knowledge in LLMs' Memory

Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations

Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection

Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge

LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked

Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models

Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration

See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses

Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

Understanding the Dark Side of LLMs' Intrinsic Self-Correction

Are You Human? An Adversarial Benchmark to Expose LLMs

Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing

DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection

Retrieving Supporting Evidence for LLMs Generated Answers

The Internal State of an LLM Knows When It's Lying

Assessing the Reliability of Large Language Model Knowledge

Investigating Answerability of LLMs for Long-Form Question Answering

Knowing When to Ask -- Bridging Large Language Models and Data