Distributed Soft Bayesian Additive Regression Trees

Hao Ran,Yang Bai
DOI: https://doi.org/10.48550/arXiv.2108.11600
2021-08-26
Abstract:Bayesian Additive Regression Trees(BART) is a Bayesian nonparametric approach which has been shown to be competitive with the best modern predictive methods such as random forest and Gradient Boosting Decision <a class="link-external link-http" href="http://Tree.The" rel="external noopener nofollow">this http URL</a> sum of trees structure combined with a Bayesian inferential framework provide a accurate and robust statistic <a class="link-external link-http" href="http://method.BART" rel="external noopener nofollow">this http URL</a> variant named SBART using randomized decision trees has been developed and show practical benefits compared to BART. The primary bottleneck of SBART is the speed to compute the sufficient statistics and the publicly avaiable implementation of the SBART algorithm in the R package is very <a class="link-external link-http" href="http://slow.In" rel="external noopener nofollow">this http URL</a> this paper we show how the SBART algorithm can be modified and computed using single program,multiple data(SPMD) distributed computation with the Message Passing Interface(MPI) <a class="link-external link-http" href="http://library.This" rel="external noopener nofollow">this http URL</a> approach scales nearly linearly in the number of processor cores, enabling the practitioner to perform statistical inference on massive datasets. Our approach can also handle datasets too massive to fit on any single data <a class="link-external link-http" href="http://repository.We" rel="external noopener nofollow">this http URL</a> have made modification to this algorithm to make it capable to handle classfication problem which can not be done with the original R <a class="link-external link-http" href="http://package.With" rel="external noopener nofollow">this http URL</a> data experiments we show the advantage of distributed SBART for classfication problem compared to BART.
Applications
What problem does this paper attempt to address?