Abstract:Pairwise interactions between perturbations to a system can provide evidence for the causal dependencies of the underlying underlying mechanisms of a system. When observations are low dimensional, hand crafted measurements, detecting interactions amounts to simple statistical tests, but it is not obvious how to detect interactions between perturbations affecting latent variables. We derive two interaction tests that are based on pairwise interventions, and show how these tests can be integrated into an active learning pipeline to efficiently discover pairwise interactions between perturbations. We illustrate the value of these tests in the context of biology, where pairwise perturbation experiments are frequently used to reveal interactions that are not observable from any single perturbation. Our tests can be run on unstructured data, such as the pixels in an image, which enables a more general notion of interaction than typical cell viability experiments, and can be run on cheaper experimental assays. We validate on several synthetic and real biological experiments that our tests are able to identify interacting pairs effectively. We evaluate our approach on a real biological experiment where we knocked out 50 pairs of genes and measured the effect with microscopy images. We show that we are able to recover significantly more known biological interactions than random search and standard active learning baselines.

What problem does this paper attempt to address?

The paper attempts to address the problem of automatically discovering pairwise interactions between perturbations in unstructured data. Specifically, detecting interactions is relatively straightforward when observing data with lower dimensions and manually designing measurement metrics, but it is less obvious how to detect interactions from perturbations affecting latent variables. The paper proposes two pairwise intervention-based interaction testing methods and demonstrates how to integrate these tests into an active learning pipeline to efficiently discover pairwise interactions between perturbations. ### Background and Problem Description 1. **Measurement**: Scientists need to decide which specific attributes of the measurement system to measure in order to reveal interactions. For example, in biology, measuring cell viability can reveal synthetic lethality, whereas measuring cell color cannot. 2. **Hypothesis Testing**: Interactions are essentially deviations from the expected effects under the assumption of independence. Scientists need to specify the expected results under the assumption of independence and compare them with the actual results. 3. **Selection**: There are often many variables that can be perturbed (e.g., there are about 20,000 genes in the human genome), but only a small subset will exhibit significant interactions. Scientists need to select interaction pairs from all possible perturbation pairs. ### Main Contributions of the Paper 1. **Separability Test**: The paper proposes a testing method to determine whether two perturbations act on different sets of latent variables. If a double perturbation experiment does not provide information beyond that of single perturbation experiments, the two perturbations are separable. 2. **Disjointness Test**: The paper also proposes a testing method to determine whether two perturbations act on disjoint subsets of outcomes. For example, if two perturbations affect different organelles in a cell, and each organelle affects different pixels in an image, the two perturbations are disjoint. 3. **Efficient Experimental Design**: The paper uses an active matrix completion method to efficiently select perturbation pairs that are likely to have high test statistics, thereby revealing pairwise interactions. ### Experimental Validation The paper validates the effectiveness of its methods on synthetic data and real biological experiments. In a benchmark experiment involving 50 gene knockout pairs, the Information Directed Sampling (IDS) method was able to discover known biological interactions more quickly and found more new interactions. Additionally, the detected interactions were complementary to existing methods based on cosine similarity, and could be combined to obtain more detailed estimates of gene relationships. ### Summary The paper develops a system to automatically discover pairwise interactions from unstructured data through the following contributions: - Proposes separability and disjointness testing methods. - Develops an efficient experimental design method to select perturbation pairs likely to have high test statistics through active matrix completion. - Validates the effectiveness of the methods on synthetic data and real biological experiments.

Automated Discovery of Pairwise Interactions from Unstructured Data

Unifying Pairwise Interactions in Complex Dynamics

Inferring Interaction Networks from Multi-Omics Data.

Error-controlled non-additive interaction discovery in machine learning models

Interaction Measures, Partition Lattices and Kernel Tests for High-Order Interactions

Identifying complex gene–gene interactions: a mixed kernel omnibus testing approach

Empowering individual trait prediction using interactions

Beyond Element-Wise Interactions: Identifying Complex Interactions in Biological Processes

The Statistical Analysis of Pairwise Experiments with Qualitative Responses

Detection of Gene-Gene Interactions by Multistage Sparse and Low-Rank Regression

A feature-based information-theoretic approach for detecting interpretable, long-timescale pairwise interactions from time series

Detecting Genetic Interactions with Visible Neural Networks

Inferring interaction partners from protein sequences

Learning Directed-Acyclic-Graphs from Large-Scale Genomics Data

Discovering Main Genetic Interactions with LABNet LAsso-Based Network Inference

Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks

Towards Interaction Detection Using Topological Analysis on Neural Networks

Predicting pairwise interaction affinities with l0-penalized least squares–a nonsmooth bi-objective optimization based approach*

Learning microbial interaction networks from metagenomic count data

Kernel Method for Detecting Higher Order Interactions in multi-view Data: An Application to Imaging, Genetics, and Epigenetics

Learning Interacting Theories from Data