Random Bits Regression: a Strong General Predictor for Big Data

Yi Wang,Yi Li,Momiao Xiong,Li Jin
DOI: https://doi.org/10.1186/s41044-016-0010-4
2015-01-13
Abstract:To improve accuracy and speed of regressions and classifications, we present a data-based prediction method, Random Bits Regression (RBR). This method first generates a large number of random binary intermediate/derived features based on the original input matrix, and then performs regularized linear/logistic regression on those intermediate/derived features to predict the outcome. Benchmark analyses on a simulated dataset, UCI machine learning repository datasets and a GWAS dataset showed that RBR outperforms other popular methods in accuracy and robustness. RBR (available on <a class="link-external link-https" href="https://sourceforge.net/projects/rbr/" rel="external noopener nofollow">this https URL</a>) is very fast and requires reasonable memories, therefore, provides a strong, robust and fast predictor in the big data era.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the accuracy and speed of regression and classification tasks in the context of the big data era. Specifically, the author proposes a data - based prediction method - Random Bits Regression (RBR). The RBR method first generates a large number of random binary intermediate/derived features from the original input matrix, and then performs regularized linear/logistic regression on these intermediate/derived features to predict the results. Through benchmark tests on simulated datasets, UCI Machine Learning Repository datasets, and GWAS datasets, the paper shows that RBR is superior to other popular methods in terms of accuracy and robustness, and has the characteristics of fast speed and reasonable memory requirements, thus providing a powerful, fast and robust predictor for the big data era.