Derivative-based regularization for regression

Enrico Lopedoto,Maksim Shekhunov,Vitaly Aksenov,Kizito Salako,Tillman Weyde
2024-05-01
Abstract:In this work, we introduce a novel approach to regularization in multivariable regression problems. Our regularizer, called DLoss, penalises differences between the model's derivatives and derivatives of the data generating function as estimated from the training data. We call these estimated derivatives data derivatives. The goal of our method is to align the model to the data, not only in terms of target values but also in terms of the derivatives involved. To estimate data derivatives, we select (from the training data) 2-tuples of input-value pairs, using either nearest neighbour or random, selection. On synthetic and real datasets, we evaluate the effectiveness of adding DLoss, with different weights, to the standard mean squared error loss. The experimental results show that with DLoss (using nearest neighbour selection) we obtain, on average, the best rank with respect to MSE on validation data sets, compared to no regularization, L2 regularization, and Dropout.
Machine Learning
What problem does this paper attempt to address?
The paper primarily aims to address the overfitting problem in regression tasks within machine learning by proposing a new regularization method. Specifically, the paper introduces a new regularization method called DLoss (Derivative Loss), which aims to minimize the difference between the model derivatives and the target function derivatives estimated from the training data. Traditional regularization methods, such as L1, L2 regularization, or Dropout, penalize model complexity based on the size of the model parameters without directly utilizing information about the target function. In contrast, the DLoss method adjusts the model's behavior based on characteristics inferred from the training data. The core idea of this method is to make the model not only close to the true values in terms of predictions but also as close as possible to the true derivatives in terms of the rate of change of the predictions (i.e., derivatives). To achieve this, the authors propose two methods for selecting data points to estimate the target function's derivatives: nearest neighbor selection and random selection. These estimated derivatives are referred to as "data derivatives." Then, by calculating the difference between the model derivatives and the data derivatives and incorporating this difference as part of the loss function, the model is trained to minimize this difference. The paper evaluates the effectiveness of DLoss through experiments on multiple real-world and synthetic datasets and compares it with other common regularization methods, including no regularization, L2 regularization, and Dropout. The experimental results show that in most cases, using DLoss can achieve better generalization performance, especially when dealing with real-world data. In summary, the main contribution of this paper is the proposal of a novel regularization method, DLoss, which improves the generalization ability of regression models by minimizing the difference between the model derivatives and the data derivatives.