AI Safety: A Climb To Armageddon?

Herman Cappelen,Josh Dever,John Hawthorne
2024-06-03
Abstract:This paper presents an argument that certain AI safety measures, rather than mitigating existential risk, may instead exacerbate it. Under certain key assumptions - the inevitability of AI failure, the expected correlation between an AI system's power at the point of failure and the severity of the resulting harm, and the tendency of safety measures to enable AI systems to become more powerful before failing - safety efforts have negative expected utility. The paper examines three response strategies: Optimism, Mitigation, and Holism. Each faces challenges stemming from intrinsic features of the AI safety landscape that we term Bottlenecking, the Perfection Barrier, and Equilibrium Fluctuation. The surprising robustness of the argument forces a re-examination of core assumptions around AI safety and points to several avenues for further research.
Artificial Intelligence
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper explores a counter - intuitive and surprising argument that, under certain assumptions, measures aimed at ensuring the safety of artificial intelligence (AI) may not only be ineffective but may also increase risks. Specifically, the paper discusses the following issues: 1. **Potential threats of super - intelligent AI**: - Super - intelligent AI may possess capabilities that surpass those of humans and can be used to achieve any goal. - Some goals may lead to the destruction of humanity, either intentionally or unintentionally, due to misaligned or misused goals. - Therefore, super - intelligent AI may lead to Armageddon. 2. **Efforts in AI safety**: - To address the above concerns, researchers are committed to developing AI safety measures and attempting to align AI values with those of humans in order to protect and promote human well - being. 3. **Core assumptions of the anti - safety argument**: - The paper assumes that AI does pose an existential threat to humanity. - Under certain specific assumptions, safety measures are not only unhelpful but also harmful. - Specifically, providing safety measures may cause an AI system to experience its first failure in a more powerful state, resulting in more severe consequences. 4. **Structure of the main argument**: - The paper illustrates its point through an analogical case - "The Doomed Rock Climber". In this example, providing safety measures (such as chalk) enables the climber to climb higher, but ultimately fall from a greater height, leading to more severe consequences. - This logic is applied to the AI field: providing safety measures may make an AI system more powerful, but cause greater damage when it first fails. 5. **Non - deterministic version of the argument**: - Even if we are uncertain about future failures, increased harm, and the effectiveness of safety measures, similar safety measures may still have a high expected damage. - By introducing probability and expected values, even in the case of uncertainty, the expected utility of providing safety measures may be lower than that of not providing them. 6. **Response strategies**: - The paper explores three response strategies: Optimism, Holism, and Mitigation. - Each strategy faces challenges, such as Bottlenecking, Perfection Barrier, and Equilibrium Fluctuation. ### Conclusion If the paper's argument holds, then the current efforts to mitigate AI risks may actually increase these risks. This conclusion forces us to re - examine the basic assumptions about AI safety and consider other possible response strategies. --- Hope this summary helps you understand the core problems and arguments of the paper. If you have more questions or need further explanation, please feel free to ask!