Abstract:LogitBoost is a popular Boosting variant that can be applied to either binary or multi-class classification. From a statistical viewpoint LogitBoost can be seen as additive tree regression by minimizing the Logistic loss. Following this setting, it is still non-trivial to devise a sound multi-class LogitBoost compared with to devise its binary counterpart. The difficulties are due to two important factors arising in multiclass Logistic loss. The first is the invariant property implied by the Logistic loss, causing the optimal classifier output being not unique, i.e. adding a constant to each component of the output vector won’t change the loss value. The second is the density of the Hessian matrices that arise when computing tree node split gain and node value fittings. Oversimplification of this learning problem can lead to degraded performance. For example, the original LogitBoost algorithm is outperformed by ABC-LogitBoost thanks to the latter’s more careful treatment of the above two factors. In this paper we propose new techniques to address the two main difficulties in multiclass LogitBoost setting: (1) we adopt a vector tree model (i.e. each node value is vector) where the unique classifier output is guaranteed by adding a sum-to-zero constraint, and (2) we use an adaptive block coordinate descent that exploits the dense Hessian when computing tree split gain and node values. Higher classification accuracy and faster convergence rates are observed for a range of public data sets when compared to both the original and the ABC-LogitBoost implementations. We also discuss another possibility to cope with LogitBoost’s dense Hessian matrix. We derive a loss similar to the multi-class Logistic loss but which guarantees a diagonal Hessian matrix. While this makes the optimization (by Newton descent) easier we unfortunately observe degraded performance for this modification. We argue that working with the dense Hessian is likely unavoidable, therefore making techniques like those proposed in this paper necessary for efficient implementations.

A Study of AdaBoost with Naive Bayesian Classifiers: Weakness and Improvement.

Improving the Performance of Boosting for Naive Bayesian Classification

Effective Boosting of Naïve Bayesian Classifiers by Local Accuracy Estimation

A Method to Boost Naïve Bayesian Classifiers

A Technique For Improving The Performance Of Naive Bayes Text Classification

Locally Weighted Learning: How and when Does It Work in Bayesian Networks?

Boosting Naive Bayesian Learning

Z-Adaboost: Boosting 2-thresholded weak classifiers for object detection

Experiments with a New Boosting Algorithm

A Bayesian Boosting Model

Lazy Bayesian Rules: A Lazy Semi-Naive Bayesian Learning Technique Competitive to Boosting Decision Trees

The success of AdaBoost and its application in portfolio management

When Analytic Calculus Cracks AdaBoost Code

Some Open Problems in Optimal AdaBoost and Decision Stumps

An improved multiclass LogitBoost using adaptive-one-vs-one

Overview of AdaBoost : Reconciling its views to better understand its dynamics

AOSO-LogitBoost: Adaptive One-Vs-One LogitBoost for Multi-Class Problem

A Theory of Probabilistic Boosting, Decision Trees and Matryoshki

An improved floatboost algorithm for Naïve bayes text classification

Theory deduction of AdaBoost classification

An Improved FloatBoost Algorithm for Naïve Bayes Text Classification.