Abstract:As AI systems become increasingly powerful, the need for safe AI has become more pressing. Humans are an attractive model for AI safety: as the only known agents capable of general intelligence, they perform robustly even under conditions that deviate significantly from prior experiences, explore the world safely, understand pragmatics, and can cooperate to meet their intrinsic goals. Intelligence, when coupled with cooperation and safety mechanisms, can drive sustained progress and well-being. These properties are a function of the architecture of the brain and the learning algorithms it implements. Neuroscience may thus hold important keys to technical AI safety that are currently underexplored and underutilized. In this roadmap, we highlight and critically evaluate several paths toward AI safety inspired by neuroscience: emulating the brain's representations, information processing, and architecture; building robust sensory and motor systems from imitating brain data and bodies; fine-tuning AI systems on brain data; advancing interpretability using neuroscience methods; and scaling up cognitively-inspired architectures. We make several concrete recommendations for how neuroscience can positively impact AI safety.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is **how to use neuroscience to enhance the security of AI systems**. As AI systems become more and more powerful, it is crucial to ensure that these systems can operate safely in various situations. The human brain, as the only known model with general intelligence, demonstrates strong robustness, the ability to explore safely, the ability to understand complex contexts, and the ability to cooperate to achieve goals. Therefore, neuroscience may provide important inspiration and solutions for technical AI security. ### Specific Problems and Solutions The paper divides AI security problems into two categories: 1. **Immediate security problems of current ordinary AI systems**: Although these systems have limited functions, they have been widely used in various fields (such as large - language models, image generators, etc.). The main security risks include algorithmic bias, the amplification effect of social problems, interference in political processes, and the impact of climate change. 2. **Long - term security problems of future agent - type AI systems**: These systems have broader autonomy and capabilities, such as robots, self - driving cars, virtual assistants, etc. Although they may bring great value to society, they also bring risks of malicious use, military applications, organizational risks, and out - of - control risks. ### Contributions of Neuroscience to AI Security The paper proposes several key paths to enhance the security of AI systems by drawing on the knowledge and techniques of neuroscience: 1. **Robustness**: Study how the brain processes unexpected inputs to ensure that AI systems can operate robustly in the face of adversarial or out - of - distribution inputs. For example, by reverse - engineering the sensory system and learning its component representations to enhance the robustness of AI systems. 2. **Specification**: Clarify the expected behavior of AI systems to ensure that AI systems can "do what we want them to do, rather than what they literally understand". This includes correctly interpreting natural language instructions, preventing learning shortcuts, and avoiding reward hijacking and other problems. 3. **Assurance**: Verify whether AI systems work as expected to ensure their transparency and interpretability. For example, use the explanation methods in neuroscience research to open the black box of AI systems and detect and correct deviations. ### Examples - **Reverse - engineering the sensory system**: By constructing digital twins of the sensory system (sensory digital twins), predict neural responses, and extract robust representations from them and apply them to AI systems. - **Establishing embodied digital twins**: By training autoregressive models to simulate the behavior of the brain and body and placing them in a virtual environment, we can better understand the operating mechanisms of the perception and motor systems. - **Inferring the brain's loss function**: Combine task - driven neural networks, inverse reinforcement learning and other techniques to infer the brain's loss and reward functions and help design AI systems that are more in line with human cognition. In short, this paper aims to explore how neuroscience can provide new ideas and methods for AI security, especially by drawing on the architecture and learning algorithms of the brain to make future AI systems more secure and reliable.

NeuroAI for AI Safety

Catalyzing next-generation Artificial Intelligence through NeuroAI

Integrative Biological Simulation, Neuropsychology, and AI Safety

Neuroscience-Inspired Artificial Intelligence

Inspect, Understand, Overcome: A Survey of Practical Methods for AI Safety

The new NeuroAI

Neuroscience and Artificial Intelligence

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Biological Blueprints for Next Generation AI Systems

Future views on neuroscience and AI

Holistic Safety and Responsibility Evaluations of Advanced AI Models

Artificial Intelligence and Neuroscience: An Update on Fascinating Relationships

Human-AI Safety: A Descendant of Generative AI and Control Systems Safety

Artificial intelligence in neurology: opportunities, challenges, and policy implications

Safeguarding AI Agents: Developing and Analyzing Safety Architectures

Strategies to architect AI Safety: Defense to guard AI from Adversaries

Trustworthy, Responsible, and Safe AI: A Comprehensive Architectural Framework for AI Safety with Challenges and Mitigations

Towards Cognitive AI Systems: a Survey and Prospective on Neuro-Symbolic AI

Neurosymbolic AI -- Why, What, and How

Mapping Technical Safety Research at AI Companies: A literature review and incentives analysis

Artificial Intelligence in Clinical Neuroscience: Methodological and Ethical Challenges