Task-oriented Time Series Imputation Evaluation via Generalized Representers

Zhixian Wang,Linxiao Yang,Liang Sun,Qingsong Wen,Yi Wang
2024-10-10
Abstract:Time series analysis is widely used in many fields such as power energy, economics, and transportation, including different tasks such as forecasting, anomaly detection, classification, etc. Missing values are widely observed in these tasks, and often leading to unpredictable negative effects on existing methods, hindering their further application. In response to this situation, existing time series imputation methods mainly focus on restoring sequences based on their data characteristics, while ignoring the performance of the restored sequences in downstream tasks. Considering different requirements of downstream tasks (e.g., forecasting), this paper proposes an efficient downstream task-oriented time series imputation evaluation approach. By combining time series imputation with neural network models used for downstream tasks, the gain of different imputation strategies on downstream tasks is estimated without retraining, and the most favorable imputation value for downstream tasks is given by combining different imputation strategies according to the estimated gain.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue of evaluating the impact of different imputation methods on downstream tasks (such as prediction) in time series imputation. Specifically: 1. **Existing Problems and Challenges**: Existing time series imputation methods mainly focus on recovering missing values based on data characteristics but overlook the performance of these recovered sequences in downstream tasks. Particularly in prediction tasks, since time series data serve as both input and labels, the quality of the labels significantly affects model performance. 2. **Proposed Method**: The authors propose a new evaluation strategy that estimates the gain of different imputation strategies on downstream tasks by combining time series imputation with neural network models used for downstream tasks, without the need to retrain the model to find the optimal imputed values. This method can significantly reduce time and computational costs. 3. **Summary of Contributions**: - Propose a strategy to evaluate the impact of missing (imputed) labels at each time step on downstream tasks without multiple retraining. - Introduce a simple and effective similarity calculation method based on long time series characteristics to quickly estimate the impact of imputed values, balancing performance and computational cost. - Develop a time series imputation framework guided by maximizing downstream task benefits, achieving better imputation results by combining the advantages of different imputation strategies, thereby improving the performance of downstream prediction tasks. In summary, this paper aims to evaluate and optimize time series imputation strategies through an efficient method, making them better serve subsequent data analysis tasks, especially in prediction scenarios.