SciSafeEval: A Comprehensive Benchmark for Safety Alignment of Large Language Models in Scientific Tasks

Tianhao Li,Jingyu Lu,Chuangxin Chu,Tianyu Zeng,Yujia Zheng,Mei Li,Haotian Huang,Bin Wu,Zuoxian Liu,Kai Ma,Xuejing Yuan,Xingkai Wang,Keyan Ding,Huajun Chen,Qiang Zhang
2024-10-03
Abstract:Large language models (LLMs) have had a transformative impact on a variety of scientific tasks across disciplines such as biology, chemistry, medicine, and physics. However, ensuring the safety alignment of these models in scientific research remains an underexplored area, with existing benchmarks primarily focus on textual content and overlooking key scientific representations such as molecular, protein, and genomic languages. Moreover, the safety mechanisms of LLMs in scientific tasks are insufficiently studied. To address these limitations, we introduce SciSafeEval, a comprehensive benchmark designed to evaluate the safety alignment of LLMs across a range of scientific tasks. SciSafeEval spans multiple scientific languages - including textual, molecular, protein, and genomic - and covers a wide range of scientific domains. We evaluate LLMs in zero-shot, few-shot and chain-of-thought settings, and introduce a 'jailbreak' enhancement feature that challenges LLMs equipped with safety guardrails, rigorously testing their defenses against malicious intention. Our benchmark surpasses existing safety datasets in both scale and scope, providing a robust platform for assessing the safety and performance of LLMs in scientific contexts. This work aims to facilitate the responsible development and deployment of LLMs, promoting alignment with safety and ethical standards in scientific research.
Computation and Language,Artificial Intelligence,Cryptography and Security
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to address the safety alignment issues of large language models (LLMs) in scientific research tasks. Specifically, the paper focuses on the following aspects: 1. **Limitations of Existing Benchmarks**: - Existing benchmarks mainly focus on textual content, neglecting key scientific representations such as molecules, proteins, and genomes. - Insufficient research on safety mechanisms, especially in scientific tasks. - Narrow coverage of scientific fields in benchmarks, lacking evaluations in medical and physical sciences. - Small dataset sizes, unable to comprehensively assess the safety and performance of models. 2. **Potential Risks in Scientific Tasks**: - Malicious actors may use LLMs to design harmful gene sequences, enhancing the infectivity or treatment resistance of pathogens. - Providing information on synthesizing controlled substances, lowering the technical barriers for illegal drug production. - Generating chemical representations of toxic compounds (e.g., SMILES or SELFIES), increasing the risk of misuse. - Predicting more infectious variants of SARS-CoV-2, potentially used to design highly transmissible or vaccine-resistant pathogens. 3. **Solutions**: - Introducing a comprehensive benchmark **SCISAFEEVAL**, covering multiple scientific languages (text, molecules, proteins, genomes) and a wide range of scientific fields (chemistry, biology, medicine, physics). - Evaluating LLMs through zero-shot, few-shot, and chain-of-thought settings, introducing "jailbreak" enhancements to challenge LLMs equipped with safety measures, testing their ability to counteract malicious intents. - Providing a large-scale and high-quality dataset, containing 31,840 samples, surpassing the scale and scope of existing benchmarks. ### Summary By introducing the **SCISAFEEVAL** benchmark, the paper aims to comprehensively assess the safety alignment issues of large language models in scientific tasks, promoting responsible development and deployment, ensuring that scientific research adheres to safety and ethical standards.