A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field Perspective

Jeng-Lin Li,Chih-Fan Hsu,Ming-Ching Chang,Wei-Chao Chen
2024-02-20
Abstract:Recent artificial intelligence (AI) technologies show remarkable evolution in various academic fields and industries. However, in the real world, dynamic data lead to principal challenges for deploying AI models. An unexpected data change brings about severe performance degradation in AI models. We identify two major related research fields, domain shift and concept drift according to the setting of the data change. Although these two popular research fields aim to solve distribution shift and non-stationary data stream problems, the underlying properties remain similar which also encourages similar technical approaches. In this review, we regroup domain shift and concept drift into a single research problem, namely the data change problem, with a systematic overview of state-of-the-art methods in the two research fields. We propose a three-phase problem categorization scheme to link the key ideas in the two technical fields. We thus provide a novel scope for researchers to explore contemporary technical strategies, learn industrial applications, and identify future directions for addressing data change challenges.
Machine Learning,Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: **How to deal with the impact of data changes on the performance of artificial intelligence models**. Specifically, the paper focuses on two major types of data change problems - **domain shift and concept drift**. ### Domain Shift Domain shift refers to the change in data distribution caused by the change in data sources. For example, in an image recognition task, the training data may come from a specific environment (such as a sunny day), while the test data may come from another different environment (such as a rainy day). This change will lead to a decline in the model's predictive ability in the new environment. ### Concept Drift Concept drift means that as time passes, the data distribution gradually changes, making the existing model obsolete. For example, in stock market prediction, the dynamic changes in the market may cause the models that were effective in the past to no longer be applicable. ### Unified Perspective The paper proposes to unify these two types of problems into "data change problems" and presents a three - stage problem classification scheme: 1. **Problem Detection**: Identify the occurrence of data changes. 2. **Problem Handling**: Adopt appropriate technical means to deal with data changes. 3. **Extended Factors**: Consider factors such as label quality, adaptation speed and time dynamics. In this way, the paper aims to provide researchers with a new perspective to better understand and solve the challenges brought by data changes, and promote the integration and innovation of technologies in different fields. ### Research Contributions - **Unified Perspective**: The paper bridges the research gap between domain shift and concept drift and provides a unified perspective to deal with data change problems in modern deep learning. - **Cutting - edge Review**: Reviews the state - of - the - art methods, shows the characteristics of the two fields, enabling researchers to be at the forefront. - **Three - stage Scheme**: Introduces a three - stage scheme including problem identification, problem handling and other related factors, classifies research topics, and reveals key developments and emerging topics. - **Future Directions**: Proposes future research directions regarding the challenges of model deployment in industrial applications. In summary, through systematically summarizing and analyzing the research progress of domain shift and concept drift, this paper aims to provide comprehensive guidance for solving complex data change problems and point out the direction for future research and development.