Abstract:Machine learning (ML) models are powerful tools for detecting complex patterns within data, yet their "black box" nature limits their interpretability, hindering their use in critical domains like healthcare and finance. To address this challenge, interpretable ML methods have been developed to explain how features influence model predictions. However, these methods often focus on univariate feature importance, overlooking the complex interactions between features that ML models are capable of capturing. Recognizing this limitation, recent efforts have aimed to extend these methods to discover feature interactions, but existing approaches struggle with robustness and error control, especially under data perturbations. In this study, we introduce Diamond, a novel method for trustworthy feature interaction discovery. Diamond uniquely integrates the model-X knockoffs framework to control the false discovery rate (FDR), ensuring that the proportion of falsely discovered interactions remains low. We further address the challenges of using off-the-shelf interaction importance measures by proposing a calibration procedure that refines these measures to maintain the desired FDR. Diamond's applicability spans a wide range of ML models, including deep neural networks, tree-based models, and factorization-based models. Our empirical evaluations on both simulated and real datasets across various biomedical studies demonstrate Diamond's utility in enabling more reliable data-driven scientific discoveries. This method represents a significant step forward in the deployment of ML models for scientific innovation and hypothesis generation.

Feature Interaction Interpretability and Beyond

Interpretable Artificial Intelligence through the Lens of Feature Interaction

Asymmetric feature interaction for interpreting model predictions

Detecting Beneficial Feature Interactions for Recommender Systems

Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond

Infinite-Dimensional Feature Interaction

REPID: Regional Effect Plots with implicit Interaction Detection

Higher-order Neural Additive Models: An Interpretable Machine Learning Model with Feature Interactions

A Survey of the Interpretability Aspect of Deep Learning Models

Towards Explanation of DNN-based Prediction with Guided Feature Inversion

Perturbation on Feature Coalition: Towards Interpretable Deep Neural Networks

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Pair the Dots: Jointly Examining Training History and Test Stimuli for Model Interpretability

xDeepInt: a hybrid architecture for modeling the vector-wise and bit-wise feature interactions

From Neurons to Neutrons: A Case Study in Interpretability

Towards Interaction Detection Using Topological Analysis on Neural Networks

Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions

Improving Interpretability of Deep Neural Networks with Semantic Information

Interaction Pursuit with Feature Screening and Selection

NeuralSI: Neural Design of Semantic Interaction for Interactive Deep Learning

Error-controlled non-additive interaction discovery in machine learning models