Unimodal Distributions for Ordinal Regression

Jaime S. Cardoso,Ricardo Cruz,Tomé Albuquerque
2023-03-08
Abstract:In many real-world prediction tasks, class labels contain information about the relative order between labels that are not captured by commonly used loss functions such as multicategory cross-entropy. Recently, the preference for unimodal distributions in the output space has been incorporated into models and loss functions to account for such ordering information. However, current approaches rely on heuristics that lack a theoretical foundation. Here, we propose two new approaches to incorporate the preference for unimodal distributions into the predictive model. We analyse the set of unimodal distributions in the probability simplex and establish fundamental properties. We then propose a new architecture that imposes unimodal distributions and a new loss term that relies on the notion of projection in a set to promote unimodality. Experiments show the new architecture achieves top-2 performance, while the proposed new loss term is very competitive while maintaining high unimodality.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problems that this paper attempts to solve are as follows: In the ordinal regression task, the existing methods fail to fully utilize the order information between class labels, especially failing to effectively generate unimodal distributions. Specifically: 1. **Limitations of Existing Methods**: - Commonly used loss functions such as the multicategory cross - entropy loss function are unable to capture the relative order information between class labels. - Most of the current methods relying on unimodal distributions are heuristic - based and lack a theoretical foundation. 2. **Goals of the Paper**: - Propose two new methods to incorporate unimodal distribution preferences into the prediction model in order to better handle the order information between class labels. - Force or promote the unimodality of the output probability distribution through a new neural network architecture and a new loss term. 3. **Specific Problem Description**: - The data features in the ordinal regression task correspond to a set of ordered class labels \( C=\{c_1, c_2,\ldots, c_K\} \), where \( c_1 < c_2 <\cdots < c_K \). - The goal is to find a reliable regression function \( h:X\rightarrow C \), mapping from the feature domain \( X \) to the ordered label domain \( C \). - Existing methods such as the multicategory cross - entropy loss function have limitations when dealing with ordinal data because they only focus on maximizing the probability of the true category, ignoring other probabilities, and do not constrain the model to generate a unimodal probability distribution. 4. **Solutions**: - A new neural network architecture is proposed, which forces the output to be a unimodal distribution. - A new loss term is proposed, which promotes unimodal distribution through the concept of projection, thereby better utilizing the order information between class labels. Through these improvements, the paper aims to improve the performance of the ordinal regression task, especially in terms of accuracy and unimodality.