Value Alignment from Unstructured Text

Inkit Padhi,Karthikeyan Natesan Ramamurthy,Prasanna Sattigeri,Manish Nagireddy,Pierre Dognin,Kush R. Varshney
2024-08-20
Abstract:Aligning large language models (LLMs) to value systems has emerged as a significant area of research within the fields of AI and NLP. Currently, this alignment process relies on the availability of high-quality supervised and preference data, which can be both time-consuming and expensive to curate or annotate. In this paper, we introduce a systematic end-to-end methodology for aligning LLMs to the implicit and explicit values represented in unstructured text data. Our proposed approach leverages the use of scalable synthetic data generation techniques to effectively align the model to the values present in the unstructured data. Through two distinct use-cases, we demonstrate the efficiency of our methodology on the Mistral-7B-Instruct model. Our approach credibly aligns LLMs to the values embedded within documents, and shows improved performance against other approaches, as quantified through the use of automatic metrics and win rates.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of aligning large language models (LLMs) with values that are implicitly or explicitly expressed in unstructured text. Currently, this alignment process relies on high-quality supervised data and preference data, which is not only time-consuming but also costly. The paper proposes a systematic end-to-end approach to effectively align LLMs with values in unstructured text data by generating scalable synthetic data. This method eliminates the need for manual curation and human feedback, and demonstrates its effectiveness in two different use cases, proving that it outperforms other methods in aligning values in unstructured data. Specifically, the method includes two main steps: first, extracting implicit and explicit values from documents through synthetic data generation techniques; second, using this synthetic data for supervised fine-tuning and preference optimization to embed values into the LLM. Through this approach, the researchers hope to develop models that can quickly adapt to any value system rather than relying solely on a "universal" value system.