Data integration strategies for whole-cell modeling

Katja Tummler,Edda Klipp
DOI: https://doi.org/10.1093/femsyr/foae011
2024-03-27
FEMS Yeast Research
Abstract:Abstract Data makes the world go round—and high quality data is a prerequisite for precise models, especially for whole-cell models (WCM). Data for WCM must be reusable, contain information about the exact experimental background and should—in its entirety—cover all relevant processes in the cell. Here, we review basic requirements to data for WCM and strategies how to combine them. As a species-specific resource, we introduce the Yeast Cell Model Data Base (YCMDB) to illustrate requirements and solutions. We discuss recent standards for data as well as for computational models including the modeling process as data to be reported. We outline strategies for constructions of WCM despite their inherent complexity.
microbiology,mycology,biotechnology & applied microbiology
What problem does this paper attempt to address?
The paper primarily explores the data integration strategies required in Whole-Cell Modeling (WCM) and introduces a systematic data collection and integration platform, the Yeast Cell Model Database (YCMDB), using Saccharomyces cerevisiae as an example. The paper attempts to address the following key issues: 1. **The need for high-quality data**: High-quality and reusable experimental data are prerequisites for building accurate whole-cell models. These data need to cover all relevant cellular processes and include detailed experimental background information. 2. **Data integration standards**: To ensure that data from different sources can be effectively integrated into a single model, standardized methods for data generation, storage, and sharing are required. The paper discusses the importance of the FAIR principles and mentions a series of standard formats used for biological models, such as SBML, BioPax, CellML, etc. 3. **Construction and application of YCMDB**: The paper introduces YCMDB as a database for systematically collecting and integrating the data needed for yeast cell models. This database includes quantitative data, metadata, and data generated during the modeling process. YCMDB aims to help researchers quickly find data suitable for specific models and evaluate the applicability and comparability of these data for the simulated scenarios. 4. **Strategies for constructing whole-cell models**: Considering the complexity of whole-cell models, the paper proposes two strategies: one is to use a unified mathematical form (such as ordinary differential equations) to describe all processes; the other is to choose the most appropriate computational form for each process. Each method has its advantages and disadvantages. In summary, the paper aims to address how to effectively integrate high-quality data from different sources in whole-cell modeling, particularly the data integration challenges in the specific case of Saccharomyces cerevisiae, and proposes a practical data integration solution—YCMDB.