Abstract:Multimodal foundation models (MFMs) represent a significant advancement in artificial intelligence, combining diverse data modalities to enhance learning and understanding across a wide range of applications. However, this integration also brings unique safety and security challenges. In this paper, we conceptualize cybersafety and cybersecurity in the context of multimodal learning and present a comprehensive Systematization of Knowledge (SoK) to unify these concepts in MFMs, identifying key threats to these models. We propose a taxonomy framework grounded in information theory, evaluating and categorizing threats through the concepts of channel capacity, signal, noise, and bandwidth. This approach provides a novel framework that unifies model safety and system security in MFMs, offering a more comprehensive and actionable understanding of the risks involved. We used this to explore existing defense mechanisms, and identified gaps in current research - particularly, a lack of protection for alignment between modalities and a need for more systematic defense methods. Our work contributes to a deeper understanding of the security and safety landscape in MFMs, providing researchers and practitioners with valuable insights for improving the robustness and reliability of these models.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the unique challenges faced by Multimodal Foundation Models (MFMs) in terms of safety and security. Specifically, the paper attempts to unify network security and cyber - security through an information - theoretic approach and provide a comprehensive Systematization of Knowledge (SoK) framework for these models. The following are the main focuses of the paper: 1. **Identifying key threats**: - The paper proposes a conceptual framework for safety and security in multimodal learning and identifies the key threats faced by MFMs. - These threats include model - level misdirection attacks, mislearning attacks, inference attacks, as well as system - level agent behavior attacks, agent interaction attacks, and system memory attacks. 2. **Establishing a classification framework**: - Based on information theory, the paper proposes a classification framework that uses concepts such as channel capacity, signal, noise, and bandwidth to evaluate and classify threats. - For example, the channel capacity \( C \) can be expressed as: \[ C = B\log_2\left(1+\frac{S}{N}\right) \] where: - \( C \) is the channel capacity (amount of information transmission) - \( B \) is the bandwidth (transmission availability) - \( S \) is the signal power (meaningful information) - \( N \) is the noise power (interfering signal) 3. **Evaluating existing defense mechanisms**: - The paper evaluates the existing defense mechanisms and finds that there are some deficiencies in current research, especially in cross - modal alignment protection and system - level defense methods. - Research shows that relying solely on model - level protection is not enough and more systematic defense methods are required. 4. **Future research directions**: - The paper proposes several future research directions, emphasizes the importance of understanding system - level interactions, and proposes new ways to study cross - modal vulnerabilities in MFMs. In summary, through an information - theoretic approach, this paper provides a unified safety and security analysis framework, which helps researchers and practitioners better understand and deal with the risks in MFMs, thereby improving the robustness and reliability of these models.

SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach

SoK: A Minimalist Approach to Formalizing Analog Sensor Security.

Multi-modal Cyberbullying Detection on Social Networks.

SoK: A Framework for Unifying At-Risk User Research

A Survey on Safe Multi-Modal Learning System

Multi‐aspects AI‐based modeling and adversarial learning for cybersecurity intelligence and robustness: A comprehensive overview

SoK: Unintended Interactions among Machine Learning Defenses and Risks

Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security

Multimodal Situational Safety

Fundamental Challenges of Cyber-Physical Systems Security Modeling

PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety

Multi-Modal Intelligent Channel Modeling: A New Modeling Paradigm via Synesthesia of Machines

Mitigating Complex Cyber Threats: An Integrated Multimodal Deep Learning Framework for Enhanced Security

Vulnerabilities of Foundation Model Integrated Federated Learning Under Adversarial Threats

Safety of Multimodal Large Language Models on Images and Texts

SoK: Game-Theoretic Cybersecurity: Assumptions, Models, Gaps, and Bridges

COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems against Semantic Attacks

Cyber-Physical Systems Security -- A Survey

Model-based Construction and Verification of Cyber-Physical Systems.

Model-based security engineering for cyber-physical systems: A systematic mapping study.

Toward a taxonomy of communications security models