Abstract:In many applications of machine learning (ML), updates are performed with the goal of enhancing model performance. However, current practices for updating models rely solely on isolated, aggregate performance analyses, overlooking important dependencies, expectations, and needs in real-world deployments. We consider how updates, intended to improve ML models, can introduce new errors that can significantly affect downstream systems and users. For example, updates in models used in cloud-based classification services, such as image recognition, can cause unexpected erroneous behavior in systems that make calls to the services. Prior work has shown the importance of "backward compatibility" for maintaining human trust. We study challenges with backward compatibility across different ML architectures and datasets, focusing on common settings including data shifts with structured noise and ML employed in inferential pipelines. Our results show that (i) compatibility issues arise even without data shift due to optimization stochasticity, (ii) training on large-scale noisy datasets often results in significant decreases in backward compatibility even when model accuracy increases, and (iii) distributions of incompatible points align with noise bias, motivating the need for compatibility aware de-noising and robustness methods.
What problem does this paper attempt to address?
### What problems does this paper attempt to solve?
The paper "An Empirical Analysis of Backward Compatibility in Machine Learning Systems" aims to explore and solve the impact of new errors introduced during the update process of machine learning (ML) models on downstream systems and users. Specifically, the paper focuses on the following issues:
1. **Side effects of model updates**:
- Current model update practices mainly rely on isolated, aggregated performance analysis, ignoring dependencies, expectations, and requirements in actual deployments.
- When updating a model to improve overall performance, new errors may be introduced, which may significantly affect the reliability of downstream systems and users.
2. **Importance of backward compatibility**:
- Backward compatibility means that while a new version of the model is improved, it will not break the correct behavior of the old version of the model.
- The paper emphasizes that even if the overall accuracy of the model is improved, new errors may be introduced, reducing the reliability of the system, especially in high - risk fields such as medicine and transportation.
3. **Impact of data drift and noise**:
- Research shows that even without data drift, compatibility problems may occur due to randomness in the optimization process.
- When training a model on a large - scale noisy dataset, although the model accuracy may be improved, backward compatibility often decreases significantly.
4. **Challenges under different ML architectures and datasets**:
- The paper studies the challenges of backward compatibility under different machine learning architectures (such as linear models, CNN, ResNet, BERT) and datasets (tabular data, visual data, language data).
- It pays special attention to ML applications in structured noise and inference pipelines.
5. **Methods and contributions**:
- It expands the empirical understanding of when and how backward compatibility problems occur in machine learning systems.
- It proposes two metrics for measuring backward compatibility: Backward Trust Compatibility (BTC) and Backward Error Compatibility (BEC).
- It emphasizes that backward compatibility should be considered when updating and diagnosing learning models to build and maintain more reliable systems.
### Formula representation
- **Backward Trust Compatibility (BTC)**:
\[
\text{BTC}=\frac{\sum_{i = 1}^{|D|}\mathbf{1}[h_1(x_i)=y_i,h_2(x_i)=y_i]}{\sum_{i = 1}^{|D|}\mathbf{1}[h_1(x_i)=y_i]}
\]
- **Backward Error Compatibility (BEC)**:
\[
\text{BEC}=\frac{\sum_{i = 1}^{|D|}\mathbf{1}[h_1(x_i)\neq y_i,h_2(x_i)\neq y_i]}{\sum_{i = 1}^{|D|}\mathbf{1}[h_2(x_i)\neq y_i]}
\]
Through these studies, the paper provides methods and tools for machine learning practitioners to understand and solve backward compatibility problems, thereby ensuring the reliability and stability of the system when updating the model.