Abstract:LogitBoost is a popular Boosting variant that can be applied to either binary or multi-class classification. From a statistical viewpoint LogitBoost can be seen as additive tree regression by minimizing the Logistic loss. Following this setting, it is still non-trivial to devise a sound multi-class LogitBoost compared with to devise its binary counterpart. The difficulties are due to two important factors arising in multiclass Logistic loss. The first is the invariant property implied by the Logistic loss, causing the optimal classifier output being not unique, i.e. adding a constant to each component of the output vector won’t change the loss value. The second is the density of the Hessian matrices that arise when computing tree node split gain and node value fittings. Oversimplification of this learning problem can lead to degraded performance. For example, the original LogitBoost algorithm is outperformed by ABC-LogitBoost thanks to the latter’s more careful treatment of the above two factors. In this paper we propose new techniques to address the two main difficulties in multiclass LogitBoost setting: (1) we adopt a vector tree model (i.e. each node value is vector) where the unique classifier output is guaranteed by adding a sum-to-zero constraint, and (2) we use an adaptive block coordinate descent that exploits the dense Hessian when computing tree split gain and node values. Higher classification accuracy and faster convergence rates are observed for a range of public data sets when compared to both the original and the ABC-LogitBoost implementations. We also discuss another possibility to cope with LogitBoost’s dense Hessian matrix. We derive a loss similar to the multi-class Logistic loss but which guarantees a diagonal Hessian matrix. While this makes the optimization (by Newton descent) easier we unfortunately observe degraded performance for this modification. We argue that working with the dense Hessian is likely unavoidable, therefore making techniques like those proposed in this paper necessary for efficient implementations.

A Direct Approach to Multi-class Boosting and Extensions

A Direct Formulation for Totally-Corrective Multi-Class Boosting

A scalable direct formulation for multi-class boosting

A scalable stage-wise approach to large-margin multi-class loss based boosting

Multi-class AdaBoost ELM

Totally-Corrective Multi-Class Boosting

Totally Corrective Multiclass Boosting with Binary Weak Learners

Sharing Features in Multi-Class Boosting Via Group Sparsity

A Multi-Class Large Margin Classifier

A Scalable Stagewise Approach to Large-Margin Multiclass Loss-Based Boosting

Fully corrective boosting with arbitrary loss and regularization.

RandomBoost: Simplified Multiclass Boosting Through Randomization

Fast training of effective multi-class boosting using coordinate descent optimization

Two-Stage Multi-Class Adaboost for Facial Expression Recognition

Online Multiclass Boosting

AOSO-LogitBoost: Adaptive One-Vs-One LogitBoost for Multi-Class Problem

Implicit JointBoost for Multiclass Object Detection under High Intra-Class Variation

An improved multiclass LogitBoost using adaptive-one-vs-one

A Method to Boost Naïve Bayesian Classifiers

Totally Corrective Boosting for Regularized Risk Minimization

StructBoost: Boosting Methods for Predicting Structured Output Variables