Abstract:As AI systems become more advanced, concerns about large-scale risks from misuse or accidents have grown. This report analyzes the technical research into safe AI development being conducted by three leading AI companies: Anthropic, Google DeepMind, and OpenAI. We define safe AI development as developing AI systems that are unlikely to pose large-scale misuse or accident risks. This encompasses a range of technical approaches aimed at ensuring AI systems behave as intended and do not cause unintended harm, even as they are made more capable and autonomous. We analyzed all papers published by the three companies from January 2022 to July 2024 that were relevant to safe AI development, and categorized the 80 included papers into nine safety approaches. Additionally, we noted two categories representing nascent approaches explored by academia and civil society, but not currently represented in any research papers by these leading AI companies. Our analysis reveals where corporate attention is concentrated and where potential gaps lie. Some AI research may stay unpublished for good reasons, such as to not inform adversaries about the details of security techniques they would need to overcome to misuse AI systems. Therefore, we also considered the incentives that AI companies have to research each approach, regardless of how much work they have published on the topic. We identified three categories where there are currently no or few papers and where we do not expect AI companies to become much more incentivized to pursue this research in the future. These are model organisms of misalignment, multi-agent safety, and safety by design. Our findings provide an indication that these approaches may be slow to progress without funding or efforts from government, civil society, philanthropists, or academia.

Key Concepts in AI Safety: An Overview

A Trilogy of AI Safety Frameworks: Paths from Facts and Knowledge Gaps to Reliable Predictions and New Knowledge

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Introduction to AI Safety, Ethics, and Society

Concrete Problems in AI Safety, Revisited

Human-AI Safety: A Descendant of Generative AI and Control Systems Safety

A Case for AI Safety via Law

Understanding and Avoiding AI Failures: A Practical Guide

An Overview of Catastrophic AI Risks

Safety Cases: How to Justify the Safety of Advanced AI Systems

Trustworthy, Responsible, and Safe AI: A Comprehensive Architectural Framework for AI Safety with Challenges and Mitigations

AI Safety: Necessary, but insufficient and possibly problematic

System Safety and Artificial Intelligence

Safeguarding AI Agents: Developing and Analyzing Safety Architectures

AI Safety Subproblems for Software Engineering Researchers

Mapping Technical Safety Research at AI Companies: A literature review and incentives analysis

On Safety Assessment of Artificial Intelligence

Evolutionary Computation and AI Safety: Research Problems Impeding Routine and Safe Real-world Application of Evolution

Safety, Trust, and Ethics Considerations for Human-AI Teaming in Aerospace Control

Introduction to Artificial Intelligence (AI) and AI-Related Concepts

AI Failures: A Review of Underlying Issues