Certifiably optimal sparse principal component analysis

Lauren Berk,Dimitris Bertsimas
DOI: https://doi.org/10.1007/s12532-018-0153-6
2019-01-01
Mathematical Programming Computation
Abstract:<h3 class="a-plus-plus">Abstract</h3> <p class="a-plus-plus">This paper addresses the sparse principal component analysis (SPCA) problem for covariance matrices in dimension <em class="a-plus-plus">n</em> aiming to find solutions with sparsity <em class="a-plus-plus">k</em> using mixed integer optimization. We propose a tailored branch-and-bound algorithm, Optimal-SPCA, that enables us to solve SPCA to certifiable optimality in seconds for <span class="a-plus-plus inline-equation id-i-eq1"> <span class="a-plus-plus equation-source format-t-e-x">\(n = 100\)</span> </span> s, <span class="a-plus-plus inline-equation id-i-eq2"> <span class="a-plus-plus equation-source format-t-e-x">\(k=10\)</span> </span> s. This same algorithm can be applied to problems with <span class="a-plus-plus inline-equation id-i-eq3"> <span class="a-plus-plus equation-source format-t-e-x">\(n=10{,}000\,\mathrm{s}\)</span> </span> or higher to find high-quality feasible solutions in seconds while taking several hours to prove optimality. We apply our methods to a number of real data sets to demonstrate that our approach scales to the same problem sizes attempted by other methods, while providing superior solutions compared to those methods, explaining a higher portion of variance and permitting complete control over the desired sparsity. The software that was reviewed as part of this submission has been given the DOI (digital object identifier) <span class="a-plus-plus non-url-ref">https://doi.org/10.5281/zenodo.2027898</span>.</p>
What problem does this paper attempt to address?