Active sampling: A machine-learning-assisted framework for finite population inference with optimal subsamples

Henrik Imberg,Xiaomi Yang,Carol Flannagan,Jonas Bärgman
DOI: https://doi.org/10.1080/00401706.2024.2374554
2024-07-04
Abstract:Data subsampling has become widely recognized as a tool to overcome computational and economic bottlenecks in analyzing massive datasets. We contribute to the development of adaptive design for estimation of finite population characteristics, using active learning and adaptive importance sampling. We propose an active sampling strategy that iterates between estimation and data collection with optimal subsamples, guided by machine learning predictions on yet unseen data. The method is illustrated on virtual simulation-based safety assessment of advanced driver assistance systems. Substantial performance improvements are demonstrated compared to traditional sampling methods.
Methodology,Applications,Computation
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively estimate the characteristics of a finite population through machine - learning - assisted active sampling strategies while reducing computational and economic costs when dealing with large - scale data sets. Specifically, the paper proposes a method for iterative estimation and data collection. This method uses optimal sub - samples and is guided by machine - learning predictions of unseen data. This method is particularly suitable for scenarios that require a large amount of computational resources or are too costly economically, such as virtual simulation experiments in the safety assessment of advanced driver - assistance systems (ADAS). Compared with traditional sampling methods, this method shows a significant performance improvement.