Automated Discovery of Pairwise Interactions from Unstructured Data

Zuheng,Moksh Jain,Ali Denton,Shawn Whitfield,Aniket Didolkar,Berton Earnshaw,Jason Hartford
2024-09-12
Abstract:Pairwise interactions between perturbations to a system can provide evidence for the causal dependencies of the underlying underlying mechanisms of a system. When observations are low dimensional, hand crafted measurements, detecting interactions amounts to simple statistical tests, but it is not obvious how to detect interactions between perturbations affecting latent variables. We derive two interaction tests that are based on pairwise interventions, and show how these tests can be integrated into an active learning pipeline to efficiently discover pairwise interactions between perturbations. We illustrate the value of these tests in the context of biology, where pairwise perturbation experiments are frequently used to reveal interactions that are not observable from any single perturbation. Our tests can be run on unstructured data, such as the pixels in an image, which enables a more general notion of interaction than typical cell viability experiments, and can be run on cheaper experimental assays. We validate on several synthetic and real biological experiments that our tests are able to identify interacting pairs effectively. We evaluate our approach on a real biological experiment where we knocked out 50 pairs of genes and measured the effect with microscopy images. We show that we are able to recover significantly more known biological interactions than random search and standard active learning baselines.
Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of automatically discovering pairwise interactions between perturbations in unstructured data. Specifically, detecting interactions is relatively straightforward when observing data with lower dimensions and manually designing measurement metrics, but it is less obvious how to detect interactions from perturbations affecting latent variables. The paper proposes two pairwise intervention-based interaction testing methods and demonstrates how to integrate these tests into an active learning pipeline to efficiently discover pairwise interactions between perturbations. ### Background and Problem Description 1. **Measurement**: Scientists need to decide which specific attributes of the measurement system to measure in order to reveal interactions. For example, in biology, measuring cell viability can reveal synthetic lethality, whereas measuring cell color cannot. 2. **Hypothesis Testing**: Interactions are essentially deviations from the expected effects under the assumption of independence. Scientists need to specify the expected results under the assumption of independence and compare them with the actual results. 3. **Selection**: There are often many variables that can be perturbed (e.g., there are about 20,000 genes in the human genome), but only a small subset will exhibit significant interactions. Scientists need to select interaction pairs from all possible perturbation pairs. ### Main Contributions of the Paper 1. **Separability Test**: The paper proposes a testing method to determine whether two perturbations act on different sets of latent variables. If a double perturbation experiment does not provide information beyond that of single perturbation experiments, the two perturbations are separable. 2. **Disjointness Test**: The paper also proposes a testing method to determine whether two perturbations act on disjoint subsets of outcomes. For example, if two perturbations affect different organelles in a cell, and each organelle affects different pixels in an image, the two perturbations are disjoint. 3. **Efficient Experimental Design**: The paper uses an active matrix completion method to efficiently select perturbation pairs that are likely to have high test statistics, thereby revealing pairwise interactions. ### Experimental Validation The paper validates the effectiveness of its methods on synthetic data and real biological experiments. In a benchmark experiment involving 50 gene knockout pairs, the Information Directed Sampling (IDS) method was able to discover known biological interactions more quickly and found more new interactions. Additionally, the detected interactions were complementary to existing methods based on cosine similarity, and could be combined to obtain more detailed estimates of gene relationships. ### Summary The paper develops a system to automatically discover pairwise interactions from unstructured data through the following contributions: - Proposes separability and disjointness testing methods. - Develops an efficient experimental design method to select perturbation pairs likely to have high test statistics through active matrix completion. - Validates the effectiveness of the methods on synthetic data and real biological experiments.