TraM : Enhancing User Sleep Prediction with Transformer-based Multivariate Time Series Modeling and Machine Learning Ensembles

Jinjae Kim,Minjeong Ma,Eunjee Choi,Keunhee Cho,Chanwoo Lee
2024-10-15
Abstract:This paper presents a novel approach that leverages Transformer-based multivariate time series model and Machine Learning Ensembles to predict the quality of human sleep, emotional states, and stress levels. A formula to calculate the labels was developed, and the various models were applied to user data. Time Series Transformer was used for labels where time series characteristics are crucial, while Machine Learning Ensembles were employed for labels requiring comprehensive daily activity statistics. Time Series Transformer excels in capturing the characteristics of time series through pre-training, while Machine Learning Ensembles select machine learning models that meet our categorization criteria. The proposed model, TraM, scored 6.10 out of 10 in experiments, demonstrating superior performance compared to other methodologies. The code and configuration for the TraM framework are available at: <a class="link-external link-https" href="https://github.com/jin-jae/ETRI-Paper-Contest" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to predict users' sleep quality, emotional state, and stress level by combining Transformer - based multivariate time - series models (Time Series Transformer, TST) and machine - learning ensemble methods (Machine Learning Ensembles). Specifically, the goals of the paper include: 1. **Improve prediction accuracy**: By integrating TST and machine - learning ensemble methods, the paper aims to provide a powerful framework to more accurately predict sleep quality and related quality - of - life labels, thereby improving health management and formulating personalized intervention strategies. 2. **Handle multivariate time - series data**: The paper proposes methods for handling missing values, standardizing data, and ensuring time alignment of multivariate time - series data to improve the robustness and prediction performance of the model. 3. **Combine the advantages of different models**: TST performs well in capturing time - series features, while machine - learning ensemble methods are more effective when dealing with labels that require comprehensive daily activity statistics. By combining these two methods, the paper achieves comprehensive prediction of sleep quality and related indicators. ### Specific problem description - **Sleep quality prediction**: Predict users' total sleep time, sleep efficiency, sleep onset time, and wake - up time through sensor data (such as accelerometers, heart rate, GPS, etc.) and user survey data. - **Emotional state and stress level prediction**: Evaluate users' emotional state and stress level through user survey data, which reflects users' subjective feelings. ### Method overview - **Time Series Transformer (TST)**: - **Pre - processing**: Resample time - series data, fill in missing values, extract statistical features, etc., to ensure data consistency and interpretability. - **Model structure**: Use Transformer encoders combined with learnable position encodings and multiple input encoding blocks to process time - series data. The model captures long - term dependencies in time - series through the self - attention mechanism. - **Pre - training and fine - tuning**: Conduct pre - training through an autoregressive denoising task, and then fine - tune on the pre - trained model to adapt to specific prediction tasks. - **Machine Learning Ensembles**: - **Pre - processing**: Calculate statistical features such as the mean and variance of sensor data, and perform feature engineering to select features suitable for training and validation data sets. - **Multi - output classifier**: Use Sklearn's MultiOutputClassifier, combine multiple machine - learning models (such as random forests, gradient boosting, logistic regression, support vector machines, decision trees, K - nearest neighbors, etc.) for multi - label prediction, and improve the accuracy and robustness of prediction by integrating the prediction results of each model through soft voting. ### Experimental results - **Performance evaluation**: On the public test data set, the proposed TraM model achieved an F1 - Score (macro - average) of 6.10, outperforming other methods, which proves the effectiveness of combining TST and machine - learning ensemble methods. Through the above methods, the paper has successfully solved the problem of how to use multivariate time - series data and machine - learning techniques to predict and improve users' sleep quality.