A new approach to data assimilation initialization problems with sparse data using multiple cost functions

David J. Abers,George Hripcsak,Lena Mamykina,Melike Sirlanci,Esteban G. Tabak
2024-11-04
Abstract:This article develops a novel data assimilation methodology, addressing challenges that are common in real-world settings, such as severe sparsity of observations, lack of reliable models, and non-stationarity of the system dynamics. These challenges often cause identifiability issues and can confound model parameter initialization, both of which can lead to estimated models with unrealistic qualitative dynamics and induce deeper parameter estimation errors. The proposed methodology's objective function is constructed as a sum of components, each serving a different purpose: enforcing point-wise and distribution-wise agreement between data and model output, enforcing agreement of variables and parameters with a model provided, and penalizing unrealistic rapid parameter changes, unless they are due to external drivers or interventions. This methodology was motivated by, developed and evaluated in the context of estimating blood glucose levels in different medical settings. Both simulated and real data are used to evaluate the methodology from different perspectives, such as its ability to estimate unmeasured variables, its ability to reproduce the correct qualitative blood glucose dynamics, how it manages known non-stationarity, and how it performs when given a range of dense and severely sparse data. The results show that a multicomponent cost function can balance the minimization of point-wise errors with global properties, robustly preserving correct qualitative dynamics and managing data sparsity.
Optimization and Control,Dynamical Systems
What problem does this paper attempt to address?
This paper attempts to address the challenges encountered in the data assimilation initialization problem, especially when dealing with sparse data. Specifically, the paper aims to solve the following problems: 1. **Sparse observational data**: The system may be observed only at a few time points, which is not sufficient to resolve the time scale of its dynamic characteristics. 2. **Lack of reliable models**: Many biomedical systems are not fully understood and are difficult to describe with accurate mathematical models. Even if there are models, there are a large number of undetermined parameters. 3. **Latent and emergent large - scale variables**: The phase space of the system is high - dimensional, but only a small number of variables are systematically observed, and the remaining latent variables can only be inferred indirectly. 4. **Existence of prior knowledge**: Although there is no detailed model, there is usually a certain understanding of the dynamic behavior of the system, such as the expected oscillation frequency and amplitude. 5. **Non - stationarity**: The underlying dynamics of the system may evolve over time, causing the model parameters to change slowly over time. These problems are particularly prominent in biomedicine and computational physiology, especially when estimating endocrine functions related to blood - glucose regulation. The author addresses these problems by introducing a new multi - objective function approach to minimize the equilibrium point error and global properties, thereby robustly maintaining the correct qualitative dynamics and managing data sparsity. ### Specific solutions To address the above challenges, the paper proposes a new data assimilation method, whose core ideas include: - **Multi - component objective function**: The objective function consists of multiple parts, each with a different role: - **L1**: Quantifies the point - to - point consistency between {xj} and {yj}. - **L2**: Quantifies the distribution consistency between {xj} and {yj}, ensuring that the statistical characteristics of the model output are consistent with the observational data. - **L3** and **L4**: Quantify the consistency of variables and parameters with the model, ensuring that the model's predictions are in line with the actual dynamics. - **L4 + l**: Penalizes the change of parameter αl over time, unless these changes are caused by external driving or intervention. - **Three - stage method**: First, initialize the model parameters α and the observable variables {xj}, then maximize only the latent variables {zj}, and finally perform a comprehensive optimization of all variables and parameters. - **Handling interventions not included in the model**: Allow the model state and parameters to change discontinuously at the intervention time to adapt to those interventions not explicitly included in the model. ### Application examples The paper evaluates the performance of this method in different scenarios through simulation and real data, such as estimating unmeasured variables, reproducing the correct blood - glucose dynamics, managing known non - stationarity, and how it performs under data of different densities. The results show that the multi - component objective function can effectively minimize the equilibrium point error and global properties, thereby capturing the dynamic characteristics of the system more accurately. Through this method, the paper provides a new idea for solving the data assimilation initialization problem in complex systems, especially suitable for the situation where the observational data in the biomedical field is sparse and the system dynamics are complex.