Exploring a new machine learning based probabilistic model for high-resolution indoor radon mapping, using the German indoor radon survey data

Eric Petermann,Peter Bossew,Joachim Kemski,Valeria Gruber,Nils Suhr,Bernd Hoffmann
2024-03-01
Abstract:Radon is a carcinogenic, radioactive gas that can accumulate indoors. Therefore, accurate knowledge of indoor radon concentration is crucial for assessing radon-related health effects or identifying radon-prone areas. Indoor radon concentration at the national scale is usually estimated on the basis of extensive measurement campaigns. However, characteristics of the sample often differ from the characteristics of the population due to the large number of relevant factors that control the indoor radon concentration such as the availability of geogenic radon or floor level. Furthermore, the sample size usually does not allow estimation with high spatial resolution. We propose a model-based approach that allows a more realistic estimation of indoor radon distribution with a higher spatial resolution than a purely data-based approach. A two-stage modelling approach was applied: 1) a quantile regression forest using environmental and building data as predictors was applied to estimate the probability distribution function of indoor radon for each floor level of each residential building in Germany; (2) a probabilistic Monte Carlo sampling technique enabled the combination and population weighting of floor-level predictions. In this way, the uncertainty of the individual predictions is effectively propagated into the estimate of variability at the aggregated level. The results show an approximate lognormal distribution with an arithmetic mean of 63 Bq/m3, a geometric mean of 41 Bq/m3 and a 95 %ile of 180 Bq/m3. The exceedance probability for 100 Bq/m3 and 300 Bq/m3 are 12.5 % (10.5 million people) and 2.2 % (1.9 million people), respectively.
Machine Learning,Data Analysis, Statistics and Probability
What problem does this paper attempt to address?
The paper mainly discusses how to use machine learning to establish a probabilistic model for high-resolution mapping of indoor radon distribution in Germany. In this study, the authors used indoor radon survey data from Germany and applied two-stage modeling methods: first, they used quantile regression forest to predict the probability distribution function of radon for each residential floor, and then combined the floor prediction results with probability Monte Carlo sampling technique and population weights. This method can estimate indoor radon concentration more accurately, even if the samples do not fully represent the main controlling factors, and provide estimates with higher spatial resolution than traditional descriptive statistics. The paper points out that radon is a carcinogenic radioactive gas related to lung cancer risk. Typically, nationwide indoor radon concentrations are estimated through large-scale measurement campaigns, but these activities may suffer from representativeness issues, leading to estimation biases. In this study, by considering environmental and building data as predictive factors, the authors were able to estimate indoor radon distribution more accurately and at a higher spatial resolution, especially in dealing with floor differences and the influence of population distribution on indoor radon concentration. The research results show that indoor radon concentrations in Germany follow an approximately log-normal distribution, with an average value of about 63 Bq/m³, a geometric mean of 41 Bq/m³, and a 95th percentile of 180 Bq/m³. The study also found that indoor radon concentrations in urban areas are generally lower than in rural areas, which is related to the population distribution on floors. With this model, it is possible to estimate the excess probabilities for specific thresholds (such as 100 Bq/m³ and 300 Bq/m³) more accurately, thus providing more accurate information for health risk assessment and implementation of protective measures.