Abstract:The amount of training-data is one of the key factors which determines the generalization capacity of learning algorithms. Intuitively, one expects the error rate to decrease as the amount of training-data increases. Perhaps surprisingly, natural attempts to formalize this intuition give rise to interesting and challenging mathematical questions. For example, in their classical book on pattern recognition, Devroye, Gyorfi, and Lugosi (1996) ask whether there exists a {monotone} Bayes-consistent algorithm. This question remained open for over 25 years, until recently Pestov (2021) resolved it for binary classification, using an intricate construction of a monotone Bayes-consistent algorithm. We derive a general result in multiclass classification, showing that every learning algorithm A can be transformed to a monotone one with similar performance. Further, the transformation is efficient and only uses a black-box oracle access to A. This demonstrates that one can provably avoid non-monotonic behaviour without compromising performance, thus answering questions asked by Devroye et al (1996), Viering, Mey, and Loog (2019), Viering and Loog (2021), and by Mhammedi (2021). Our transformation readily implies monotone learners in a variety of contexts: for example it extends Pestov's result to classification tasks with an arbitrary number of labels. This is in contrast with Pestov's work which is tailored to binary classification. In addition, we provide uniform bounds on the error of the monotone algorithm. This makes our transformation applicable in distribution-free settings. For example, in PAC learning it implies that every learnable class admits a monotone PAC learner. This resolves questions by Viering, Mey, and Loog (2019); Viering and Loog (2021); Mhammedi (2021).

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is whether the performance of machine - learning algorithms monotonically decreases when the amount of training data is increased. Specifically, the paper explores whether there exists a learning algorithm that can ensure that the overall loss (or error rate) will not increase when more training data is obtained, that is, it exhibits monotonically decreasing behavior. Since this problem was proposed by Devroye, Györfi, and Lugosi (1996), it has been an open problem, especially in binary classification tasks, until Pestov (2021) solved the problem in the binary classification case. The main contributions of the paper are as follows: 1. **Proposing a general result**: The paper proves that for multi - classification tasks, any learning algorithm \(A\) can be converted into a monotonic learning algorithm \(M\), and this conversion process is efficient, which can be completed only by black - box access to \(A\). This shows that non - monotonic behavior can be avoided without sacrificing performance, thus answering the questions raised by Devroye et al. (1996), Viering et al. (2019, 2021) and Mhammedi (2021). 2. **Extending Pestov's result**: By applying this conversion method, Pestov's result can be generalized to classification tasks with an arbitrary number of labels, not just binary classification tasks. This makes this method applicable in a wider range of situations. 3. **Providing a uniform bound on the error**: The paper also provides a uniform bound on the error of the monotonic algorithm, which enables this conversion method to be applied without distribution assumptions. For example, in the PAC - learning framework, this means that for each learnable class, there exists a monotonic PAC - learner. ### Main technical contributions of the paper 1. **Constructing a general framework**: The paper develops a general axiomatic framework for constructing a conversion method that converts any learner into a monotonic learner with similar guarantees. The core of this framework is to construct a small and symmetric hypothesis class \(B_h\) for each hypothesis \(h\), such that \(h\in B_h\), and \(B_h\) can be learned by a monotonic learner. For example, in binary classification tasks, \(B_h = \{h, 1 - h\}\), while in multi - classification tasks, \(B_h=\{s_i\circ h:i\in[k]\}\), where \(s_i\) is a cyclic permutation of the labels. 2. **Proving the main theorem**: The paper uses the above framework to prove the main theorem (Theorem 1.2) in Sections 3 and 4. Section 3 focuses on binary classification tasks as a warm - up for the more general multi - classification setting, which is discussed in Section 4. The most complex part of the proof is the proof of Proposition 4.1, especially Lemma 4.2, which asserts that the randomized empirical risk minimizer (ERM) is monotonic on \(B_h\). ### Related work - **The concept of monotonic learning curves**: It was originally proposed by Devroye, Györfi, and Lugosi (1996), but it has not attracted wide attention until recent years. - **Other research**: Viering, Mey, and Loog (2019, 2020) and Mhammedi (2021) studied methods for converting a given learner into a monotonic learner and proposed some weak forms of monotonicity. - **Pestov's work**: Pestov (2021) solved the problem in binary classification tasks, and this paper extends his results to multi - classification tasks. ### Conclusion By providing a general conversion method, this paper proves that any learning algorithm can be converted into a monotonic learning algorithm without sacrificing performance. This not only answers long - standing theoretical questions but also provides new tools and methods for practical applications.

Monotone Learning

Metric learning for monotonic classification: turning the space up to the limits of monotonicity

Monotonic Learning with Hypothesis Evolution

A UNIFIED STUDY OF NONPARAMETRIC INFERENCE FOR MONOTONE FUNCTIONS

Monotonic classification: an overview on algorithms, performance measures and data sets

Large-margin Feature Selection for Monotonic Classification

Local Generalization Error Based Monotonic Classification Extreme Learning Machine

Agnostic proper learning of monotone functions: beyond the black-box correction barrier

Monotone probability distributions over the Boolean cube can be learned with sublinear samples

Expressive Monotonic Neural Networks

Feature Selection for Monotonic Classification

Monotone Individual Fairness

Fraudulent Firm Classification Using Monotonic Classification Techniques

Monotonic classification extreme learning machine.

Induction of Monotonic Decision Trees

Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures

How to address monotonicity for model risk management?

Prediction, Learning, Uniform Convergence, and Scale-sensitive Dimensions

Regularization and Optimal Multiclass Learning

Monotonicity for AI ethics and society: An empirical study of the monotonic neural additive model in criminology, education, health care, and finance

Efficient supervised learning in networks with binary synapses