A Safe Harbor for AI Evaluation and Red Teaming
Shayne Longpre,Sayash Kapoor,Kevin Klyman,Ashwin Ramaswami,Rishi Bommasani,Borhane Blili-Hamelin,Yangsibo Huang,Aviya Skowron,Zheng-Xin Yong,Suhas Kotha,Yi Zeng,Weiyan Shi,Xianjun Yang,Reid Southen,Alexander Robey,Patrick Chao,Diyi Yang,Ruoxi Jia,Daniel Kang,Sandy Pentland,Arvind Narayanan,Percy Liang,Peter Henderson
2024-03-08
Abstract:Independent evaluation and red teaming are critical for identifying the risks
posed by generative AI systems. However, the terms of service and enforcement
strategies used by prominent AI companies to deter model misuse have
disincentives on good faith safety evaluations. This causes some researchers to
fear that conducting such research or releasing their findings will result in
account suspensions or legal reprisal. Although some companies offer researcher
access programs, they are an inadequate substitute for independent research
access, as they have limited community representation, receive inadequate
funding, and lack independence from corporate incentives. We propose that major
AI developers commit to providing a legal and technical safe harbor,
indemnifying public interest safety research and protecting it from the threat
of account suspensions or legal reprisal. These proposals emerged from our
collective experience conducting safety, privacy, and trustworthiness research
on generative AI systems, where norms and incentives could be better aligned
with public interests, without exacerbating model misuse. We believe these
commitments are a necessary step towards more inclusive and unimpeded community
efforts to tackle the risks of generative AI.
Artificial Intelligence