Abstract: Recently, high-dimensional heterogeneous data have attracted a lot of attention and discussion. Under heterogeneity, semiparametric regression is a popular choice to model data in statistics. In this paper, we take advantages of expectile regression in computation and analysis of heterogeneity, and propose the regularized partially linear additive expectile regression with nonconvex penalty, for example, SCAD or MCP for such high-dimensional heterogeneous data. We focus on a more realistic scenario: the regression error is heavy-tailed distributed and only has finite moments, which is violated with the classical sub-gaussian distribution assumption and more common in practise. Under some regular conditions, we show that with probability tending to one, the oracle estimator is one of the local minima of our optimization problem. The theoretical study indicates that the dimension cardinality of linear covariates our procedure can handle with is essentially restricted by the moment condition of the regression error. For computation, since the corresponding optimization problem is nonconvex and nonsmooth, we derive a two-step algorithm to solve this problem. Finally, we demonstrate that the proposed method enjoys good performances in estimation accuracy and model selection through Monto Carlo simulation studies and a real data example. What's more, by taking different expectile weights $\alpha$, we are able to detect heterogeneity and explore the entire conditional distribution of the response variable, which indicates the usefulness of our proposed method for analyzing high dimensional heterogeneous data.

Semiparametric Expectile Regression for High-dimensional Heavy-tailed and Heterogeneous Data

Expectile regression for analyzing heteroscedasticity in high dimension

Robust Estimation and Shrinkage in Ultrahigh Dimensional Expectile Regression with Heavy Tails and Variance Heterogeneity

An Improved Algorithm for High-Dimensional Continuous Threshold Expectile Model with Variance Heterogeneity

How Data Heterogeneity Affects Innovating Knowledge and Information in Gene Identification: A Statistical Learning Perspective

Variable Selection in Expectile Regression

Inference for High-Dimensional Linear Expectile Regression with De-Biasing Method

Prediction of Extremal Expectile Based on Regression Models With Heteroscedastic Extremes

Retire: Robust expectile regression in high dimensions

Distributed optimization and statistical learning for large-scale penalized expectile regression

Ultra high-dimensional semiparametric longitudinal data analysis

High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality

Semiparametric efficient estimation in high‐dimensional partial linear regression models

Variable Screening and Model Averaging for Expectile Regressions

Poisson subsampling-based estimation for growing-dimensional expectile regression in massive data

High Dimensional Binary Choice Model with Unknown Heteroskedasticity or Instrumental Variables

Extreme expectile estimation for short-tailed data

Transformation of Rhododendron through microprojectile bombardment

Role of prostaglandins in initiating cardiovascular reflexes originating from the pancreas and the gall bladder.

High-Dimensional Sparse Additive Hazards Regression

A Copula-Based Approach to Modelling and Testing for Heavy-tailed Data with Bivariate Heteroscedastic Extremes