A Smoothing Algorithm for l1 Support Vector Machines

Ibrahim Emirahmetoglu,Jeffrey Hajewski,Suely Oliveira,David E. Stewart
2023-12-17
Abstract:A smoothing algorithm is presented for solving the soft-margin Support Vector Machine (SVM) optimization problem with an $\ell^{1}$ penalty. This algorithm is designed to require a modest number of passes over the data, which is an important measure of its cost for very large datasets. The algorithm uses smoothing for the hinge-loss function, and an active set approach for the $\ell^{1}$ penalty. The smoothing parameter $\alpha$ is initially large, but typically halved when the smoothed problem is solved to sufficient accuracy. Convergence theory is presented that shows $\mathcal{O}(1+\log(1+\log_+(1/\alpha)))$ guarded Newton steps for each value of $\alpha$ except for asymptotic bands $\alpha=\Theta(1)$ and $\alpha=\Theta(1/N)$, with only one Newton step provided $\eta\alpha\gg1/N$, where $N$ is the number of data points and the stopping criterion that the predicted reduction is less than $\eta\alpha$. The experimental results show that our algorithm is capable of strong test accuracy without sacrificing training speed.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the following issues: 1. **Optimization Problem**: A smoothing algorithm is proposed to solve the soft-margin support vector machine (SVM) optimization problem with an `ℓ1` penalty term. Traditional SVMs have high iteration costs when dealing with large-scale datasets. This algorithm aims to reduce the number of data traversals by smoothing the hinge loss function and using an active set method to handle the `ℓ1` penalty term. 2. **Handling Non-Smoothness**: The `ℓ1` penalty term introduces non-smoothness, making the objective function difficult to optimize. The algorithm addresses this issue by maintaining the non-smoothness of the `ℓ1` penalty term and using an active set method. 3. **Efficient Solving**: For very large datasets, traditional methods like the Newton method have high computational costs per step. This algorithm ensures that the number of Newton steps required per iteration is constant by gradually reducing the smoothing parameter, thereby improving overall efficiency. 4. **Sparsity**: By using the `ℓ1` penalty term, the solution is sparse, meaning that there are fewer non-zero elements in the solution. This further reduces complexity during computation. 5. **Convergence Analysis**: The paper also provides a theoretical convergence analysis, proving that as the smoothing parameter gradually decreases, the number of Newton steps per iteration is `O(1 + log log(1/α))`, where `α` is the smoothing parameter. In summary, the paper focuses on efficiently solving the support vector machine optimization problem with an `ℓ1` penalty term on large-scale datasets, while maintaining solution sparsity and good training speed.