Classification via score-based generative modelling

Yongchao Huang
DOI: https://doi.org/10.48550/arXiv.2207.11091
2022-07-22
Abstract:In this work, we investigated the application of score-based gradient learning in discriminative and generative classification settings. Score function can be used to characterize data distribution as an alternative to density. It can be efficiently learned via score matching, and used to flexibly generate credible samples to enhance discriminative classification quality, to recover density and to build generative classifiers. We analysed the decision theories involving score-based representations, and performed experiments on simulated and real-world datasets, demonstrating its effectiveness in achieving and improving binary classification performance, and robustness to perturbations, particularly in high dimensions and imbalanced situations.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper explores the application of score-based gradient learning in both discriminative and generative classification settings. Specifically, the paper attempts to address the following issues: 1. **Representation of Data Distribution**: How to use the score function to represent data distribution as an alternative to density estimation. 2. **Learning the Score Function**: How to efficiently learn the score function through methods such as score matching. 3. **Generating Credible Samples**: How to use the learned score function to generate credible samples to enhance the quality of discriminative classification. 4. **Recovering Density**: How to recover the data density distribution from the score function. 5. **Building Generative Classifiers**: How to use the score function to build generative classifiers. 6. **Improving Classification Performance**: How to improve the performance and robustness of binary classification tasks in high-dimensional and imbalanced data scenarios through the score function. ### Keywords - Score-based modelling - Discriminative classification - Generative classification - Imbalanced learning ### Research Background Traditional classification methods are usually divided into two categories: generative methods and discriminative methods. Generative methods infer the posterior probability \( p(y|x) \) by modeling the class-conditional density \( p(x|y) \), while discriminative methods directly model the posterior probability \( p(y|x) \). Generative methods have advantages in handling missing values and outliers, but density estimation is often very difficult in high-dimensional data. Discriminative methods are more direct, avoiding the complex problem of density estimation, but may not perform as well as generative methods in some cases. ### Main Contributions 1. **Application of the Score Function**: The paper proposes using the score function to represent data distribution and learning the score function through methods such as score matching. 2. **Combination of Generative and Discriminative Classification**: The paper demonstrates how the learned score function can be used for generative classifiers and to assist discriminative classifiers, especially in imbalanced data scenarios. 3. **Experimental Validation**: Through experiments on simulated and real-world datasets, the paper validates the effectiveness of score-based methods in improving classification performance and robustness. ### Conclusion Through theoretical analysis and experimental validation, the paper demonstrates the effectiveness and potential of score-based gradient learning in generative and discriminative classification tasks, especially in high-dimensional and imbalanced data scenarios. These methods provide new ideas and technical means for improving classification performance.