Abstract:Transfer learning algorithms have been developed in various applicational contexts while only a few of them offer statistical guarantees in high-dimensions. Among these work, the differences between the target and sources, a.k.a. the contrasts, are typically modeled as, or at least close to, vectors with certain low-dimensional structure (e.g., sparsity), resulting in a separate debiasing step after a preceding pooling estimation procedure. Under such intuitive yet powerful framework, additional homogeneity conditions on Hessian matrices of the population loss functions are often imposed to preserve the delicate low-dimensional structure of the contrasts during pooling, which is either unrealistic in practice or easily destroyed by basic data transformation such as standardization. In this article, under the general M-estimators framework with decomposable regularizers, we highlight the role of fine-tuning underneath the conspicuous gain of the debiasing step in transfer learning. Namely, we find it is possible to enhance estimation accuracy by fine-tuning a primal estimator sufficiently close to the true target one. Our theory suggests slightly enlarging the pooling regularization strength when either the contrast's low-dimensional structure or the homogeneity of Hessian matrices is violated. Traditional linear regression and generalized low-rank trace regression in high-dimensions are discussed as two specific examples under our framework. When the informative source datasets are unknown, a novel truncated-penalized algorithm is proposed to directly output the primal estimator by simultaneously selecting the useful sources and its oracle property is proved. Extensive numerical experiments are conducted to validate the theoretical assertions. A case study on the air quality regulation in China by transfer learning is also provided for illustration.

A Comparative Study on Regularization Strategies for Embedding-based Neural Networks.

Wordreg: Mitigating the Gap Between Training and Inference with Worst-Case Drop Regularization

Regularized Structured Perceptron: A Case Study on Chinese Word Segmentation, POS Tagging and Parsing.

Information Guided Regularization for Fine-tuning Language Models

Penetrating the influence of regularizations on neural network based on information bottleneck theory

Network as Regularization for Training Deep Neural Networks: Framework, Model and Performance

Towards Better Understanding with Uniformity and Explicit Regularization of Embeddings in Embedding-based Neural Topic Models

LDA-Reg: Knowledge Driven Regularization using External Corpora

Recurrent Neural Network Regularization

Consistency of Neural Networks with Regularization

Learning Regularized Noise Contrastive Estimation for Robust Network Embedding.

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Regressing Word and Sentence Embeddings for Regularization of Neural Machine Translation

A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings

Layer-wise Regularized Dropout for Neural Language Models

Delta Embedding Learning

The Efficacy of Regularization in Two Layer Neural Networks

The Role of Fine-tuning: Transfer Learning for High-dimensional M-estimators with Decomposable Regularizers

Learning Repeatable Speech Embeddings Using An Intra-class Correlation Regularizer

Analysing Dropout and Compounding Errors in Neural Language Models

Effective Neural Network $L_0$ Regularization With BinMask