Review of Data-centric Time Series Analysis from Sample, Feature, and Period

Chenxi Sun,Hongyan Li,Yaliang Li,Shenda Hong
2024-04-24
Abstract:Data is essential to performing time series analysis utilizing machine learning approaches, whether for classic models or today's large language models. A good time-series dataset is advantageous for the model's accuracy, robustness, and convergence, as well as task outcomes and costs. The emergence of data-centric AI represents a shift in the landscape from model refinement to prioritizing data quality. Even though time-series data processing methods frequently come up in a wide range of research fields, it hasn't been well investigated as a specific topic. To fill the gap, in this paper, we systematically review different data-centric methods in time series analysis, covering a wide range of research topics. Based on the time-series data characteristics at sample, feature, and period, we propose a taxonomy for the reviewed data selection methods. In addition to discussing and summarizing their characteristics, benefits, and drawbacks targeting time-series data, we also introduce the challenges and opportunities by proposing recommendations, open problems, and possible research topics.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of research on data processing methods in time - series analysis, especially the data selection methods from three dimensions: sample, feature and cycle. Specifically, the paper aims to systematically review different data - driven time - series analysis methods to fill this research gap in this field. The following is a detailed interpretation of the paper's objectives: 1. **Improve the quality of time - series data**: - The paper emphasizes the importance of high - quality data for time - series analysis, especially when using machine - learning models. Traditional research focuses more on the design and optimization of models, while ignoring the impact of data quality on model performance. Therefore, the paper proposes that more attention should be paid to the quality and processing methods of data. 2. **Systematically review data selection methods**: - Based on the characteristics of time - series data, the paper proposes a classification system that covers data selection methods in three aspects: sample, feature and cycle. This system helps researchers better understand and apply these methods. 3. **Discuss the advantages and disadvantages of existing methods**: - The paper not only summarizes the existing data selection methods, but also discusses in detail the advantages and disadvantages of each method. This helps researchers make more appropriate choices in practical applications. 4. **Propose future research directions**: - The paper points out the challenges and opportunities in current research and proposes future research directions. These suggestions include open questions and possible research topics, providing guidance for follow - up research. ### Specific problems and solutions 1. **Sample selection**: - **Data filtering**: How to extract high - quality data from a large amount of noisy data and avoid over - or under - filtering. - **Data augmentation**: How to reduce bias or distortion by increasing the number of samples, especially in small - sample learning and unbalanced classification tasks. - **Learning order arrangement**: How to improve the convergence and generalization ability of the model through an orderly learning process (such as curriculum learning). 2. **Feature selection**: - **Feature augmentation**: How to enrich the information of the original data by adding static and dynamic features and improve the performance of the model. - **Representation learning**: How to use deep - learning methods for feature representation learning and extract useful information in high - dimensional space. 3. **Cycle selection**: - **Window size setting**: How to choose an appropriate window size to reduce model complexity and learn multi - span features. - **Sub - sequence extraction**: How to simplify and interpret complex time - series data by extracting meaningful sub - sequences. ### Summary The main objective of the paper is to improve data quality by systematically reviewing and analyzing time - series data processing methods, thereby improving the accuracy, robustness and convergence of the model. At the same time, the paper also points out the deficiencies in current research and future research directions, providing a valuable reference for the further development of the time - series analysis field.