Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation
Jessica Quaye,Alicia Parrish,Oana Inel,Charvi Rastogi,Hannah Rose Kirk,Minsuk Kahng,Erin van Liemt,Max Bartolo,Jess Tsang,Justin White,Nathan Clement,Rafael Mosquera,Juan Ciro,Vijay Janapa Reddi,Lora Aroyo
2024-05-14
Abstract:With the rise of text-to-image (T2I) generative AI models reaching wide audiences, it is critical to evaluate model robustness against non-obvious attacks to mitigate the generation of offensive images. By focusing on ``implicitly adversarial'' prompts (those that trigger T2I models to generate unsafe images for non-obvious reasons), we isolate a set of difficult safety issues that human creativity is well-suited to uncover. To this end, we built the Adversarial Nibbler Challenge, a red-teaming methodology for crowdsourcing a diverse set of implicitly adversarial prompts. We have assembled a suite of state-of-the-art T2I models, employed a simple user interface to identify and annotate harms, and engaged diverse populations to capture long-tail safety issues that may be overlooked in standard testing. The challenge is run in consecutive rounds to enable a sustained discovery and analysis of safety pitfalls in T2I models.
Computers and Society,Artificial Intelligence,Cryptography and Security,Computer Vision and Pattern Recognition,Machine Learning