When a Classifier Meets More Data

zhicheng liao,yangyong zhu
DOI: https://doi.org/10.1016/j.procs.2014.05.380
2014-01-01
Procedia Computer Science
Abstract:The studies of generalization error give possible approaches to estimate the performance of a classification. But they are still expensive and difficult to use on large-scale data. In this paper, we discover that the accuracy of a classification is regional convergence with respect to the size of training data set, and give a Bounded Accuracy Conjecture. We also find that to train a classification with a little noisy training data set will not impact the accuracy. Finally, we give an easy but effectively experimental approach to build a good enough train data set for a given large-scale problem.
What problem does this paper attempt to address?