Learning a Model with the Most Generality for Small-Sample Problems.

Jianjie Jin,Fuyi Yin,Yuxiang Xu,Junying Zhang
DOI: https://doi.org/10.1145/3579731.3579814
2022-01-01
Abstract:In machine learning, when the number of available samples is very limited, it constitutes a small-sample problem. Small-sample problem is extremely challenging in learning a good model. As the first study on building the most reliable and generalizable model and providing the most reliable measure of a model's generalization performance, a large number of random sampling sets without or with noise added are created from observation dataset to simulate unseen training sets or unseen test sets respectively, where the former makes it available to consider constituting a model of the most generalization ability, and the latter makes it available to measure generalization performance of a model in a most reliable manner. In modeling process, each training set is used to train a model, and the models are then combined to form global/final model. In evaluation process, a performance metric (such as mean square error) is calculated on each test set, and the model's generalization ability is the most reliably represented by the average value of these metrics. The method is especially suitable to small or even ultra-small sample problems due to its no need of data splitting in cross-validation approaches. Comparative experiments on different datasets show the effectiveness of the proposed method in the case of ultra-small-sample problems: for simulated data, the generalization error is approximately 80% lower than those of the conventional method, and for a typical ultra-small-sample problem, the real data of glomerular filtration rate prediction, it is 46.36%∼83.02% lower than those of many popular models.
What problem does this paper attempt to address?