Abstract:Distributional regression aims to estimate the full conditional distribution of a target variable, given covariates. Popular methods include linear and tree-ensemble based quantile regression. We propose a neural network-based distributional regression methodology called `engression'. An engression model is generative in the sense that we can sample from the fitted conditional distribution and is also suitable for high-dimensional outcomes. Furthermore, we find that modelling the conditional distribution on training data can constrain the fitted function outside of the training support, which offers a new perspective to the challenging extrapolation problem in nonlinear regression. In particular, for `pre-additive noise' models, where noise is added to the covariates before applying a nonlinear transformation, we show that engression can successfully perform extrapolation under some assumptions such as monotonicity, whereas traditional regression approaches such as least-squares or quantile regression fall short under the same assumptions. Our empirical results, from both simulated and real data, validate the effectiveness of the engression method and indicate that the pre-additive noise model is typically suitable for many real-world scenarios. The software implementations of engression are available in both R and Python.

What problem does this paper attempt to address?

The main problem this paper attempts to address is the extrapolation issue in regression analysis, particularly how to effectively perform extrapolation in nonlinear regression. Specifically, the authors propose a neural network-based method called "Engression," which is a distribution regression method aimed at estimating the full conditional distribution of the target variable given the covariates. Through this method, researchers hope to perform effective extrapolation beyond the support range of the training data. ### Problems Addressed by the Paper: 1. **Extrapolation in Nonlinear Regression**: Traditional regression methods (such as least squares or quantile regression) perform poorly in extrapolation, especially when the data distribution extends beyond the support range of the training data. The Engression method constrains the behavior of the fitting function outside the support range of the training data by modeling the conditional distribution, thereby achieving better extrapolation performance. 2. **Distribution Regression**: Unlike traditional methods that only estimate the conditional mean, the Engression method can estimate the full conditional distribution, which includes not only the mean but also other statistics (such as quantiles). This distribution estimation method provides more information in extrapolation tasks, helping to better understand the true behavior of the data. 3. **Pre-Additive Noise Model**: The paper introduces a pre-additive noise model (pre-ANM), where noise is added before applying the nonlinear transformation. This model performs better in extrapolation tasks compared to the traditional post-additive noise model (post-ANM), especially in cases with high noise levels. Through theoretical analysis and empirical studies, the paper validates the effectiveness and applicability of the Engression method and demonstrates its superior performance on various real-world datasets. Additionally, the paper provides software implementations in R and Python, making it convenient for researchers and practitioners to use this method.

Engression: Extrapolation through the Lens of Distributional Regression

A Likelihood Based Approach to Distribution Regression Using Conditional Deep Generative Models

How Inverse Conditional Flows Can Serve as a Substitute for Distributional Regression

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Distributional regression: CRPS-error bounds for model fitting, model selection and convex aggregation

Regression via Arbitrary Quantile Modeling

Theoretical and Experimental Analysis on the Generalizability of Distribution Regression Network

Ensemble Deep Learning-Based Non-Crossing Quantile Regression for Nonparametric Probabilistic Forecasting of Wind Power Generation

Prediction of Extremal Expectile Based on Regression Models With Heteroscedastic Extremes

DistPred: A Distribution-Free Probabilistic Inference Method for Regression and Forecasting

Distributional Regression for Data Analysis

Marginally-calibrated deep distributional regression

Ensemble Multi-Quantiles: Adaptively Flexible Distribution Prediction for Uncertainty Quantification

Sublinear expectation linear regression

Expectile Neural Networks for Genetic Data Analysis of Complex Diseases

Learning and Interpreting Complex Distributions in Empirical Data

Getting more from your regression model: A free lunch?

Distribution Regression for Sequential Data

Distributional Refinement Network: Distributional Forecasting via Deep Learning

Generative Conditional Distributions by Neural (Entropic) Optimal Transport

Online Distributional Regression