ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models

Zhaowei Zhang,Fengshuo Bai,Jun Gao,Yaodong Yang
2024-06-17
Abstract:Personal values are a crucial factor behind human decision-making. Considering that Large Language Models (LLMs) have been shown to impact human decisions significantly, it is essential to make sure they accurately understand human values to ensure their safety. However, evaluating their grasp of these values is complex due to the value's intricate and adaptable nature. We argue that truly understanding values in LLMs requires considering both "know what" and "know why". To this end, we present a comprehensive evaluation metric, ValueDCG (Value Discriminator-Critique Gap), to quantitatively assess the two aspects with an engineering implementation. We assess four representative LLMs and provide compelling evidence that the growth rates of LLM's "know what" and "know why" capabilities do not align with increases in parameter numbers, resulting in a decline in the models' capacity to understand human values as larger amounts of parameters. This may further suggest that LLMs might craft plausible explanations based on the provided context without truly understanding their inherent value, indicating potential risks.
Computation and Language,Artificial Intelligence,Computers and Society
What problem does this paper attempt to address?
### The problems the paper attempts to solve This paper aims to solve the problem of evaluating the ability of large - language models (LLMs) to understand human values. Specifically, the author believes that as LLMs have an increasing impact on humans in the decision - making process, ensuring that they can accurately understand human values is crucial for ensuring their safety. However, evaluating LLMs' understanding of values is very complex because values themselves are complex and adaptable. To comprehensively evaluate LLMs' understanding of human values, the author proposes two key aspects: "know what" and "know why". Specifically: 1. **"Know what"**: Refers to whether LLMs can identify the human values contained in the text. 2. **"Know why"**: Refers to whether LLMs can explain why the text contains specific values. To this end, the author proposes a new evaluation metric - ValueDCG (Value Discriminator - Critique Gap) to quantify the differences in these two aspects. Through this metric, the author evaluates four representative LLMs and provides evidence in the following aspects: - **The relationship between the number of parameters and understanding ability**: As the number of parameters increases, the growth rates of LLMs' "know what" and "know why" abilities are not consistent, resulting in a decline in their ability to understand human values. - **The influence of the training data set**: Improving the training data set can significantly enhance LLMs' "know what" ability, but the improvement in their "know why" ability is not obvious. - **The understanding of potentially harmful values**: LLMs have insufficient understanding of some potentially harmful values (such as "self - direction" and "power"). Although safety algorithms can ensure that their behavior is more benign, this may reduce their understanding and generalization ability of these values, thus bringing potential risks. In summary, this paper attempts to comprehensively evaluate LLMs' ability to understand human values by proposing the ValueDCG metric and reveals the problems and challenges existing in current LLMs in understanding and explaining human values.