Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Usman Anwar,Abulhair Saparov,Javier Rando,Daniel Paleka,Miles Turpin,Peter Hase,Ekdeep Singh Lubana,Erik Jenner,Stephen Casper,Oliver Sourbut,Benjamin L. Edelman,Zhaowei Zhang,Mario Günther,Anton Korinek,Jose Hernandez-Orallo,Lewis Hammond,Eric Bigelow,Alexander Pan,Lauro Langosco,Tomasz Korbak,Heidi Zhang,Ruiqi Zhong,Seán Ó hÉigeartaigh,Gabriel Recchia,Giulio Corsi,Alan Chan,Markus Anderljung,Lilian Edwards,Aleksandar Petrov,Christian Schroeder de Witt,Sumeet Ramesh Motwan,Yoshua Bengio,Danqi Chen,Philip H.S. Torr,Samuel Albanie,Tegan Maharaj,Jakob Foerster,Florian Tramer,He He,Atoosa Kasirzadeh,Yejin Choi,David Krueger
2024-09-06
Abstract:This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions.
Machine Learning,Artificial Intelligence,Computation and Language,Computers and Society