Data-driven learning of structure augments quantitative prediction of biological responses

Yuanchi Ha,Helena R. Ma,Feilun Wu,Andrea Weiss,Katherine Duncker,Helen Z. Xu,Jia Lu,Max Golovsky,Daniel Reker,Lingchong You
DOI: https://doi.org/10.1371/journal.pcbi.1012185
2024-06-04
PLoS Computational Biology
Abstract:Multi-factor screenings are commonly used in diverse applications in medicine and bioengineering, including optimizing combination drug treatments and microbiome engineering. Despite the advances in high-throughput technologies, large-scale experiments typically remain prohibitively expensive. Here we introduce a machine learning platform, structure-augmented regression (SAR), that exploits the intrinsic structure of each biological system to learn a high-accuracy model with minimal data requirement. Under different environmental perturbations, each biological system exhibits a unique, structured phenotypic response. This structure can be learned based on limited data and once learned, can constrain subsequent quantitative predictions. We demonstrate that SAR requires significantly fewer data comparing to other existing machine-learning methods to achieve a high prediction accuracy, first on simulated data, then on experimental data of various systems and input dimensions. We then show how a learned structure can guide effective design of new experiments. Our approach has implications for predictive control of biological systems and an integration of machine learning prediction and experimental design. Using limited data to predict how biological systems respond to combinations of perturbations is important for better understanding of biological systems and for practical applications. Here we present a new method, structure-augmented regression, that efficiently learns biological response landscapes from sparse measurements. Our method exploits the property that many biological response landscapes have distinct lower-dimensional structures, which can be learned first to assist the subsequent quantitative predictions. We demonstrate that our algorithm outperforms existing algorithms on simulated and experimental data of various biological systems and dimensions. We further exploit the learned structure by suggesting new experiments that refines the learned structure to further improve the prediction accuracy. By integrating machine learning with experimental design, our approach has implications for the predictive control of biological systems.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?