Abstract:Understanding the inner mechanisms of black-box foundation models (FMs) is essential yet challenging in artificial intelligence and its applications. Over the last decade, the long-running focus has been on their explainability, leading to the development of post-hoc explainable methods to rationalize the specific decisions already made by black-box FMs. However, these explainable methods have certain limitations in terms of faithfulness and resource requirement. Consequently, a new class of interpretable methods should be considered to unveil the underlying mechanisms of FMs in an accurate, comprehensive, heuristic, and resource-light way. This survey aims to review those interpretable methods that comply with the aforementioned principles and have been successfully applied to FMs. These methods are deeply rooted in machine learning theory, covering the analysis of generalization performance, expressive capability, and dynamic behavior. They provide a thorough interpretation of the entire workflow of FMs, ranging from the inference capability and training dynamics to their ethical implications. Ultimately, drawing upon these interpretations, this review identifies the next frontier research directions for FMs.

What problem does this paper attempt to address?

The problem that this paper attempts to solve lies in understanding and revealing the internal mechanisms of foundation models (FMs), especially in view of the limitations of existing explanation methods in terms of fidelity and resource requirements. The paper points out that although existing post - hoc explanation methods perform well in explaining some black - box models, they are often not reliable enough for complex foundation models and may even be misleading. Therefore, the paper proposes a new class of interpretable methods, aiming to uncover the underlying mechanisms of foundation models in a more accurate, comprehensive, heuristic and less resource - consuming way. Specifically, the paper focuses on the following aspects: 1. **Limited Fidelity**: Existing post - hoc explanation methods are difficult to be fully faithful to the real decision - making process of the model, which limits their effectiveness. 2. **High Resource Requirements**: As the complexity of the model increases, using traditional post - hoc explanation methods (such as SHAP and LIME) becomes more and more computationally intensive, resulting in the limitation of the application of these methods on large - scale data sets. 3. **New Interpretable Methods**: The paper proposes an interpretable method based on machine - learning theory, including generalization performance analysis, expressive ability analysis and dynamic behavior analysis. These methods can explain the behavior of foundation models more systematically, from the training to the inference stage, covering aspects such as in - context learning, chain - of - thought reasoning and ethical implications. Through these new interpretable methods, the paper aims to provide a deeper understanding of foundation models, thereby guiding future research directions and improving the transparency and reliability of the models.

A Theoretical Survey on Foundation Models

Foundation Model Sherpas: Guiding Foundation Models through Knowledge and Reasoning

Learn From Model Beyond Fine-Tuning: A Survey

A Comprehensive Survey of Foundation Models in Medicine

Understanding Foundation Models: Are We Back in 1924?

Foundation Models Meet Visualizations: Challenges and Opportunities

Progress and opportunities of foundation models in bioinformatics

Training and Serving System of Foundation Models: A Comprehensive Survey

Semantic Communications using Foundation Models: Design Approaches and Open Issues

Robot Learning in the Era of Foundation Models: A Survey

Resource-efficient Algorithms and Systems of Foundation Models: A Survey

Foundation models in brief: A historical, socio-technical focus

A Survey of Methods for Explaining Black Box Models

Advances and Open Challenges in Federated Foundation Models

Towards Graph Foundation Models: A Survey and Beyond

Foundation Model for Advancing Healthcare: Challenges, Opportunities, and Future Directions

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

A Survey of Reasoning with Foundation Models

Data-Centric Foundation Models in Computational Healthcare: A Survey