Scalable Estimation for Structured Additive Distributional Regression

Nikolaus Umlauf,Johannes Seiler,Mattias Wetscher,Thorsten Simon,Stefan Lang,Nadja Klein
DOI: https://doi.org/10.1080/10618600.2024.2388604
2024-10-11
Journal of Computational and Graphical Statistics
Abstract:Obtaining probabilistic models is of high relevance in many recent applications. However, estimation of such distributional models with very large datasets remains a difficult task. In particular, the use of rather complex models can easily lead to memory-related efficiency problems and thereby make estimation infeasible even on high-performance computers. We address these challenges and propose a novel backfitting algorithm, which is based on the ideas of stochastic gradient descent and can deal virtually with any amount of data on a conventional laptop. The algorithm performs automatic selection of variables and determination of smoothing parameters. Its performance is superior or at least equivalent to other implementations for structured additive distributional regression, such as, gradient boosting, while maintaining lower computation time. Performance is evaluated using an extensive simulation study and an exceptionally challenging example of lightning count prediction across Austria with over 9 million observations and 80 covariates. Supplementary materials for this article are available online.
statistics & probability
What problem does this paper attempt to address?