Abstract:In this work, we introduce a novel approach to regularization in multivariable regression problems. Our regularizer, called DLoss, penalises differences between the model's derivatives and derivatives of the data generating function as estimated from the training data. We call these estimated derivatives data derivatives. The goal of our method is to align the model to the data, not only in terms of target values but also in terms of the derivatives involved. To estimate data derivatives, we select (from the training data) 2-tuples of input-value pairs, using either nearest neighbour or random, selection. On synthetic and real datasets, we evaluate the effectiveness of adding DLoss, with different weights, to the standard mean squared error loss. The experimental results show that with DLoss (using nearest neighbour selection) we obtain, on average, the best rank with respect to MSE on validation data sets, compared to no regularization, L2 regularization, and Dropout.

What problem does this paper attempt to address?

The paper primarily aims to address the overfitting problem in regression tasks within machine learning by proposing a new regularization method. Specifically, the paper introduces a new regularization method called DLoss (Derivative Loss), which aims to minimize the difference between the model derivatives and the target function derivatives estimated from the training data. Traditional regularization methods, such as L1, L2 regularization, or Dropout, penalize model complexity based on the size of the model parameters without directly utilizing information about the target function. In contrast, the DLoss method adjusts the model's behavior based on characteristics inferred from the training data. The core idea of this method is to make the model not only close to the true values in terms of predictions but also as close as possible to the true derivatives in terms of the rate of change of the predictions (i.e., derivatives). To achieve this, the authors propose two methods for selecting data points to estimate the target function's derivatives: nearest neighbor selection and random selection. These estimated derivatives are referred to as "data derivatives." Then, by calculating the difference between the model derivatives and the data derivatives and incorporating this difference as part of the loss function, the model is trained to minimize this difference. The paper evaluates the effectiveness of DLoss through experiments on multiple real-world and synthetic datasets and compares it with other common regularization methods, including no regularization, L2 regularization, and Dropout. The experimental results show that in most cases, using DLoss can achieve better generalization performance, especially when dealing with real-world data. In summary, the main contribution of this paper is the proposal of a novel regularization method, DLoss, which improves the generalization ability of regression models by minimizing the difference between the model derivatives and the data derivatives.

Derivative-based regularization for regression

Wordreg: Mitigating the Gap Between Training and Inference with Worst-Case Drop Regularization

DL-Reg: A deep learning regularization technique using linear regression

Gradient-based bilevel optimization for multi-penalty Ridge regression through matrix differential calculus

High-Dimensional Linear Regression via Implicit Regularization

Analysis of the expected $L_2$ error of an over-parametrized deep neural network estimate learned by gradient descent without regularization

An algorithmic view of $\ell_2$ regularization and some path-following algorithms

Dropout Regularization Versus $\ell_2$-Penalization in the Linear Model

Large-Margin Regularized Softmax Cross-Entropy Loss

Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent

Lai Loss: A Novel Loss for Gradient Control

Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint

High-dimensional Penalty Selection via Minimum Description Length Principle

Regularized Deep Linear Discriminant Analysis

Regularization properties of adversarially-trained linear regression

Explicit Regularization via Regularizer Mirror Descent

Leverage Domain-invariant assumption for regularization

A Distributionally Robust Optimization Approach for Multivariate Linear Regression under the Wasserstein Metric

Distributionally Robust Groupwise Regularization Estimator

Improving Data Analytics with Fast and Adaptive Regularization

Gradient Aligned Regression via Pairwise Losses