Multi-LSTM: A Multiscale Long Short-Term Memory-Based Framework for Predicting Time-Series Transcriptomic Gene Expression

Ying Zhou,Erteng Jia,Huajuan Shi,Zhiyu Liu,Yuqi Sheng,Min Pan,Jing Tu,Qinyu Ge,Zuhong Lu
DOI: https://doi.org/10.21203/rs.3.rs-1586379/v1
2022-01-01
Abstract:Abstract RNA degradation can significantly affect the results of gene expression profiling, with subsequent analysis failing to faithfully represent the initial gene expression level. It is urgent to have an artificial intelligence approach to better utilize the limited data to obtain meaningful and reliable analysis results in the case of data with missing destination time. In this study, we propose a method based on signal decomposition technique and deep learning, named Multi-LSTM. It is divided into two main modules, one decomposes the collected gene expression sequences by empirical mode decomposition (EMD) algorithm to obtain a series of subsequences with different frequencies to improve data stability and reduce modeling complexity. The other is based on long short-term memory (LSTM) as the core predictor, aiming to deeply explore the temporal nonlinear relationships embedded in the subsequences. Finally, the prediction results of subsequences are reconstructed to obtain the final prediction results of time-series transcriptomic gene expression. The results show that EMD can efficiently reduce the nonlinearity of the original sequences, which provides a reliable theoretical support to reduce the complexity and improve the robustness of LSTM models. Overall, the decomposition-combination prediction framework can effectively predict gene expression levels at unknown time points.
What problem does this paper attempt to address?