Collective Constitutional AI: Aligning a Language Model with Public Input

Saffron Huang,Divya Siddarth,Liane Lovitt,Thomas I. Liao,Esin Durmus,Alex Tamkin,Deep Ganguli
DOI: https://doi.org/10.1145/3630106.3658979
2024-06-12
Abstract:There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them. To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs-from identifying a target population to sourcing principles to training and evaluating a model. We demonstrate the real-world practicality of this approach by creating what is, to our knowledge, the first LM fine-tuned with collectively sourced public input and evaluating this model against a baseline model trained with established principles from a LM developer. Our quantitative evaluations demonstrate several benefits of our approach: the CCAI-trained model shows lower bias across nine social dimensions compared to the baseline model, while maintaining equivalent performance on language, math, and helpful-harmless evaluations. Qualitative comparisons of the models suggest that the models differ on the basis of their respective constitutions, e.g., when prompted with contentious topics, the CCAI-trained model tends to generate responses that reframe the matter positively instead of a refusal. These results demonstrate a promising, tractable pathway toward publicly informed development of language models.
Artificial Intelligence,Computation and Language,Human-Computer Interaction
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper attempts to solve the problem that the behavior of language models (LMs) is solely determined by developers during the development process. As language models are more widely deployed and applied to an increasing number of different scenarios, this single - decision - making model may lead to various risks and harms. Therefore, a method is needed to enable the broader public to collectively influence the behavior of these language models to ensure that the model's behavior is in line with the public's values and preferences. Specifically, the paper proposes the method of **Collective Constitutional AI (CCAI)**, which is a multi - stage process, from identifying the target population to collecting principles, and then to training and evaluating the model. Through this method, the authors created, to their knowledge, the first language model fine - tuned with publicly - sourced collective input and compared it with the baseline model trained on existing developer principles. ### Main contributions 1. **Framework development**: Proposed and developed a framework for fine - tuning language models according to public input so that their behavior conforms to public preferences. 2. **First practice**: Implemented the first large - scale language model fine - tuned with publicly - sourced collective input. 3. **Qualitative analysis**: Conducted a qualitative analysis of the differences between the standard constitution and the public constitution and their subsequent model outputs. 4. **Quantitative analysis**: Conducted a quantitative analysis of the two models, comparing their performance in multiple benchmark tests. ### Method overview 1. **Participant selection**: Selected 1,002 participants representing the adult population in the United States, covering age, gender, income, and geographical distribution. 2. **Input collection**: Collected participants' input, including voting and submitting statements, through a web application containing a modified Polis platform. 3. **Input processing**: Screened, deduplicated, and aggregated the collected statements, ultimately forming the public constitution. 4. **Model training**: Used the Constitutional AI method to train the public constitution model and the standard constitution model respectively. ### Results - **Quantitative analysis**: The public constitution model has a lower bias score in nine social dimensions, while maintaining comparable performance to the baseline model in language, mathematics, and useful - harmlessness evaluations. - **Qualitative analysis**: When dealing with controversial topics, the public constitution model tends to generate positive responses rather than refusing to answer. ### Significance This research demonstrates the feasibility and potential of adjusting the behavior of language models through collective input, which helps to reduce model bias and improve the consistency between the model and public values. This provides an important reference and direction for future research and practice.