DT-LET: Deep Transfer Learning by Exploring where to Transfer

Jianzhe Lin,Qi Wang,Rabab Ward,Z. Jane Wang
DOI: https://doi.org/10.48550/arXiv.1809.08541
2018-09-23
Abstract:Previous transfer learning methods based on deep network assume the knowledge should be transferred between the same hidden layers of the source domain and the target domains. This assumption doesn't always hold true, especially when the data from the two domains are heterogeneous with different resolutions. In such case, the most suitable numbers of layers for the source domain data and the target domain data would differ. As a result, the high level knowledge from the source domain would be transferred to the wrong layer of target domain. Based on this observation, "where to transfer" proposed in this paper should be a novel research frontier. We propose a new mathematic model named DT-LET to solve this heterogeneous transfer learning problem. In order to select the best matching of layers to transfer knowledge, we define specific loss function to estimate the corresponding relationship between high-level features of data in the source domain and the target domain. To verify this proposed cross-layer model, experiments for two cross-domain recognition/classification tasks are conducted, and the achieved superior results demonstrate the necessity of layer correspondence searching.
Machine Learning
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve a key problem in deep transfer learning: **How to determine between which layers of the source domain and the target domain knowledge should be transferred**. Traditional methods usually assume that knowledge transfer between the source domain and the target domain should occur between the same hidden layers, but this assumption is not always valid in the case of heterogeneous data (such as data with different resolutions). Specifically, when the data in the source domain and the target domain have different features or resolutions, the most appropriate transfer layers may not be the same. If the high - level knowledge of the source domain is directly transferred to the wrong layer in the target domain, it may lead to performance degradation or the introduction of incorrect information. Therefore, this paper proposes a new research direction - "**where to transfer**", that is, to explore the problem of the best knowledge transfer layer matching. ### Main contributions 1. **Introducing the "where to transfer" problem**: - It is proposed that the deep networks of the source domain and the target domain do not need to have the same parameter settings, allowing cross - layer transfer learning. 2. **Proposing the DT - LET framework**: - Based on Stacked Auto - Encoders (SAE), by defining a new unified objective loss function, the best correspondence between the source - domain and target - domain neural networks is found. - Optimize this objective function to determine the best settings of the two deep networks and their correspondence. 3. **Experimental verification**: - Experiments were carried out on two cross - domain recognition / classification tasks (handwritten digit recognition and text - to - image classification) to verify the effectiveness of the proposed cross - layer model. - The experimental results show that finding the best layer correspondence is crucial for improving the performance of the learning task in the target domain. ### Formula representation To ensure the correctness and readability of the formulas, the following are the main formulas involved in the paper: - **Hidden layer representation**: \[ HS(n + 1)=f(WS(n)\times HS(n)+bS(n)),\quad n > 1 \] \[ HS(n)=f(WS(n)\times CS + bS(n)),\quad n = 1 \] \[ HT(n + 1)=f(WT(n)\times HT(n)+bT(n)),\quad n > 1 \] \[ HT(n)=f(WT(n)\times CT + bT(n)),\quad n = 1 \] - **Objective function**: \[ L(Rs,t)=Ls(\theta_S)+LT(\theta_T)+P(VS, VT) \] where: - \( Ls(\theta_S) \) and \( LT(\theta_T) \) respectively represent the reconstruction errors of the source domain and the target domain: \[ LS(\theta_S)=\left[\frac{1}{ns}\sum_{i = 1}^{ns}\left(\frac{1}{2}\|h_{WS,bS}(Cs_i)-Xs_i\|^2\right)\right]+\frac{\lambda}{2}\sum_{l = 1}^{nS - 1}\sum_{j = 1}^{nS_l}\sum_{k = 1}^{nS_{l + 1}}(WS(l)_{kj})^2 \] \[ LT(\theta_T)=\left[\frac{1}{nt}\sum_{i = 1}^{nt}\left(\frac{1}{2}\|h_{WT,bT}(Ct_i)-Ct_i\|^2\right)\right]+\frac{\lambda}{2}\sum_{l = 1}^{nT - 1}\sum_{j = 1}^{nT_l}\sum_{k = 1}^{nT_{l}}