Abstract:Data is essential to performing time series analysis utilizing machine learning approaches, whether for classic models or today's large language models. A good time-series dataset is advantageous for the model's accuracy, robustness, and convergence, as well as task outcomes and costs. The emergence of data-centric AI represents a shift in the landscape from model refinement to prioritizing data quality. Even though time-series data processing methods frequently come up in a wide range of research fields, it hasn't been well investigated as a specific topic. To fill the gap, in this paper, we systematically review different data-centric methods in time series analysis, covering a wide range of research topics. Based on the time-series data characteristics at sample, feature, and period, we propose a taxonomy for the reviewed data selection methods. In addition to discussing and summarizing their characteristics, benefits, and drawbacks targeting time-series data, we also introduce the challenges and opportunities by proposing recommendations, open problems, and possible research topics.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the lack of research on data processing methods in time - series analysis, especially the data selection methods from three dimensions: sample, feature and cycle. Specifically, the paper aims to systematically review different data - driven time - series analysis methods to fill this research gap in this field. The following is a detailed interpretation of the paper's objectives: 1. **Improve the quality of time - series data**: - The paper emphasizes the importance of high - quality data for time - series analysis, especially when using machine - learning models. Traditional research focuses more on the design and optimization of models, while ignoring the impact of data quality on model performance. Therefore, the paper proposes that more attention should be paid to the quality and processing methods of data. 2. **Systematically review data selection methods**: - Based on the characteristics of time - series data, the paper proposes a classification system that covers data selection methods in three aspects: sample, feature and cycle. This system helps researchers better understand and apply these methods. 3. **Discuss the advantages and disadvantages of existing methods**: - The paper not only summarizes the existing data selection methods, but also discusses in detail the advantages and disadvantages of each method. This helps researchers make more appropriate choices in practical applications. 4. **Propose future research directions**: - The paper points out the challenges and opportunities in current research and proposes future research directions. These suggestions include open questions and possible research topics, providing guidance for follow - up research. ### Specific problems and solutions 1. **Sample selection**: - **Data filtering**: How to extract high - quality data from a large amount of noisy data and avoid over - or under - filtering. - **Data augmentation**: How to reduce bias or distortion by increasing the number of samples, especially in small - sample learning and unbalanced classification tasks. - **Learning order arrangement**: How to improve the convergence and generalization ability of the model through an orderly learning process (such as curriculum learning). 2. **Feature selection**: - **Feature augmentation**: How to enrich the information of the original data by adding static and dynamic features and improve the performance of the model. - **Representation learning**: How to use deep - learning methods for feature representation learning and extract useful information in high - dimensional space. 3. **Cycle selection**: - **Window size setting**: How to choose an appropriate window size to reduce model complexity and learn multi - span features. - **Sub - sequence extraction**: How to simplify and interpret complex time - series data by extracting meaningful sub - sequences. ### Summary The main objective of the paper is to improve data quality by systematically reviewing and analyzing time - series data processing methods, thereby improving the accuracy, robustness and convergence of the model. At the same time, the paper also points out the deficiencies in current research and future research directions, providing a valuable reference for the further development of the time - series analysis field.

Review of Data-centric Time Series Analysis from Sample, Feature, and Period

A review on time series data mining

Data Mining in Time Series: Current Study and Future Trend

A Review of Deep Learning Models for Time Series Prediction

A Systematic Review of Time Series Classification Techniques Used in Biomedical Applications

Survey and Taxonomy: The Role of Data-Centric AI in Transformer-Based Time Series Forecasting

Time Series Analysis for Education: Methods, Applications, and Future Directions

A Methodological Review on Time Series Panel Data

Explainable AI for Time Series Classification: A Review, Taxonomy and Research Directions

Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook

Data-centric AI: Techniques and Future Perspectives.

Time Series Data Augmentation for Deep Learning: A Survey

Data-centric Artificial Intelligence: A Survey

A Review of Deep Learning Methods for Irregularly Sampled Medical Time Series Data

Systematic review of data-centric approaches in artificial intelligence and machine learning

Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey

A Review of Open Source Software Tools for Time Series Analysis

Highly comparative time-series analysis: the empirical structure of time series and their methods

Foundation Models for Time Series Analysis: A Tutorial and Survey

Enhancing data preparation: insights from a time series case study

Deep Time Series Models: A Comprehensive Survey and Benchmark