Finding Key Training Data by Calculating Influence Score.

Jiahao Xu,Fan Zhang,Samee U. Khan
DOI: https://doi.org/10.1145/3565387.3565403
2022-01-01
Abstract:Due to the complexity and opacity of decision models and increasing data volume requirements, this makes it more attractive to reduce data volume and improve model interpretability by selecting key data. In this paper, we propose an influence function-based method InfSort for data sorting and pruning, and demonstrate that the key data selected by this method outperforms an equal number of other data. In addition, we also found that the importance of the data is positively correlated with the speed and stability of the loss, and the key data is more conducive to speeding up the model convergence. We also developed a method CGT that prevents the risk of overfitting by controlling for the worst case distribution of the data. Experimental results show that our method is effective and efficient in emotion recognition tasks.
What problem does this paper attempt to address?