Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
Homework 11 - Transfer Learning (Domain Adversarial Training)
Deep Learning
notebook
python
Deep Learningnotebookpython
goujiaxin
发布于 2024-04-11
推荐镜像 :Basic Image:bohrium-notebook:2023-04-07
推荐机型 :c12_m46_1 * NVIDIA GPU B
3
2
Homework 11 - Transfer Learning (Domain Adversarial Training)
Readme
Scenario and Why Domain Adversarial Training
Domain Adversarial Training of Nerural Networks (DaNN)
Data Introduce
Special Domain Knowledge
Canny Edge Detection
Data Process
Model
Pre-processing
Start Training
DaNN Implementation
Reminder
Inference
Visualization
Step1: Load checkpoint and evaluate to get extracted features
Step2: Apply t-SNE and normalize
Step3: Visualization with matplotlib
Training Statistics
Learning Curve (Strong Baseline)
Accuracy Curve (Strong Baseline)
Q&A

Homework 11 - Transfer Learning (Domain Adversarial Training)

Author: Arvin Liu (r09922071@ntu.edu.tw)

If there are any questions, please contact mlta-2022-spring@googlegroups.com

代码
文本

Readme

In homework 11, you will need to implement Domain Adversarial Training in Transfer Learning. As shown in the bottom left part of the figure.

Scenario and Why Domain Adversarial Training

Now we have labeled source data and unlabeled target data, where source data might be relavent to the target data. We now want to train a model with source data only and test it on target data.

What problem might occur if we do so? After we have learned Anomaly Detection, we now know that if we test the model with an abnormal data that have never appeared in source data, our trained model is likely to result in poor performance since it is not familiar with the abnormal data.

For example, we have a model that contains Feature Extractor and Classifier:

When the model is trained with source data, the feature extractor will extract meaningful features since it is familiar with the distribution of it.It could be seen in the following figure that the blue dots, which is the distribution of source data, has already been clustered into different clusters. Therefore, the Classifier can predict the label based on these clusters.

However, when test on the target data, the Feature Extractor will not be able to extract meaningful features that follow the distribution of the source feature distribution, which result in the classifier learned for the source domain will not be able to apply to the target domain.

Domain Adversarial Training of Nerural Networks (DaNN)

Based on the above problems, DaNN approaches build mappings between the source (training-time) and the target (test-time) domains, so that the classifier learned for the source domain can also be applied to the target domain, when composed with the learned mapping between domains.

In DaNN, the authors added a Domain Classifier, which is a deep discriminatively-trained classifeir in the training framework to distinguish the data from different domain by the features extracted by the feature extractor. As the training progresses, the approach promotes a domain classifier that discriminates between the source and the target domains and a feature extractor that can extractor features that are discriminative for the main learning task on the source domain and indiscriminate with respect to the shift between the domains.

The feature extractor are likely to outperform the domain classifier as its input are generated by the feature extractor and that the task of domain classification and label classification are not conflict.

This method leads to the emergence of features that are domain-invariant and on the same feature distribution.

代码
文本

Data Introduce

Our task contains source data: real photos, and target data: hand-drawn graffiti.

We are going to train the model with the photos and the labels, and try to predict what the labels are for hand-drawn graffiti.

The data could be downloaded here. The code below is for data downloading and visualization.

Note that: The source and target data are all balanced data, you can make use of this information.

代码
文本
[ ]
# Download dataset
import os
os.environ['HTTP_PROXY'] = 'http://ga.dp.tech:8118'
os.environ['HTTPS_PROXY'] = 'http://ga.dp.tech:8118'
#!wget "https://github.com/redxouls/ml2020spring-hw11-dataset/releases/download/v1.0.0/real_or_drawing.zip" -O real_or_drawing.zip

# Download from mirrored dataset link
!wget "https://github.com/redxouls/ml2020spring-hw11-dataset/releases/download/v1.0.1/real_or_drawing.zip" -O real_or_drawing.zip
# !wget "https://github.com/redxouls/ml2020spring-hw11-dataset/releases/download/v1.0.2/real_or_drawing.zip" -O real_or_drawing.zip

# Unzip the files
!unzip real_or_drawing.zip
代码
文本
[ ]
import matplotlib.pyplot as plt

def no_axis_show(img, title='', cmap=None):
# imshow, and set the interpolation mode to be "nearest"。
fig = plt.imshow(img, interpolation='nearest', cmap=cmap)
# do not show the axes in the images.
fig.axes.get_xaxis().set_visible(False)
fig.axes.get_yaxis().set_visible(False)
plt.title(title)

titles = ['horse', 'bed', 'clock', 'apple', 'cat', 'plane', 'television', 'dog', 'dolphin', 'spider']
plt.figure(figsize=(18, 18))
for i in range(10):
plt.subplot(1, 10, i+1)
fig = no_axis_show(plt.imread(f'real_or_drawing/train_data/{i}/{500*i}.bmp'), title=titles[i])
代码
文本
[ ]
plt.figure(figsize=(18, 18))
for i in range(10):
plt.subplot(1, 10, i+1)
fig = no_axis_show(plt.imread(f'real_or_drawing/test_data/0/' + str(i).rjust(5, '0') + '.bmp'))
代码
文本

Special Domain Knowledge

When we graffiti, we usually draw the outline only, therefore we can perform edge detection processing on the source data to make it more similar to the target data.

Canny Edge Detection

The implementation of Canny Edge Detection is as follow. The algorithm will not be describe thoroughly here. If you are interested, please refer to the wiki or here.

We only need two parameters to implement Canny Edge Detection with CV2: low_threshold and high_threshold.

cv2.Canny(image, low_threshold, high_threshold)

Simply put, when the edge value exceeds the high_threshold, we determine it as an edge. If the edge value is only above low_threshold, we will then determine whether it is an edge or not.

Let's implement it on the source data.

代码
文本
[ ]
import cv2
import matplotlib.pyplot as plt
titles = ['horse', 'bed', 'clock', 'apple', 'cat', 'plane', 'television', 'dog', 'dolphin', 'spider']
plt.figure(figsize=(18, 18))

original_img = plt.imread(f'real_or_drawing/train_data/0/0.bmp')
plt.subplot(1, 5, 1)
no_axis_show(original_img, title='original')

gray_img = cv2.cvtColor(original_img, cv2.COLOR_RGB2GRAY)
plt.subplot(1, 5, 2)
no_axis_show(gray_img, title='gray scale', cmap='gray')

gray_img = cv2.cvtColor(original_img, cv2.COLOR_RGB2GRAY)
plt.subplot(1, 5, 2)
no_axis_show(gray_img, title='gray scale', cmap='gray')

canny_50100 = cv2.Canny(gray_img, 50, 100)
plt.subplot(1, 5, 3)
no_axis_show(canny_50100, title='Canny(50, 100)', cmap='gray')

canny_150200 = cv2.Canny(gray_img, 150, 200)
plt.subplot(1, 5, 4)
no_axis_show(canny_150200, title='Canny(150, 200)', cmap='gray')

canny_250300 = cv2.Canny(gray_img, 250, 300)
plt.subplot(1, 5, 5)
no_axis_show(canny_250300, title='Canny(250, 300)', cmap='gray')
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:15: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.

  from ipykernel import kernelapp as app
代码
文本

Data Process

The data is suitible for torchvision.ImageFolder. You can create a dataset with torchvision.ImageFolder. Details for image augmentation please refer to the comments in the following codes.

代码
文本
[ ]
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Function
import torch.optim as optim
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
source_transform = transforms.Compose([
# Turn RGB to grayscale. (Bacause Canny do not support RGB images.)
transforms.Grayscale(),
# cv2 do not support skimage.Image, so we transform it to np.array,
# and then adopt cv2.Canny algorithm.
transforms.Lambda(lambda x: cv2.Canny(np.array(x), 170, 300)),
# Transform np.array back to the skimage.Image.
transforms.ToPILImage(),
# 50% Horizontal Flip. (For Augmentation)
transforms.RandomHorizontalFlip(),
# Rotate +- 15 degrees. (For Augmentation), and filled with zero
# if there's empty pixel after rotation.
transforms.RandomRotation(15, fill=(0,)),
# Transform to tensor for model inputs.
transforms.ToTensor(),
])
target_transform = transforms.Compose([
# Turn RGB to grayscale.
transforms.Grayscale(),
# Resize: size of source data is 32x32, thus we need to
# enlarge the size of target data from 28x28 to 32x32。
transforms.Resize((32, 32)),
# 50% Horizontal Flip. (For Augmentation)
transforms.RandomHorizontalFlip(),
# Rotate +- 15 degrees. (For Augmentation), and filled with zero
# if there's empty pixel after rotation.
transforms.RandomRotation(15, fill=(0,)),
# Transform to tensor for model inputs.
transforms.ToTensor(),
])
source_dataset = ImageFolder('real_or_drawing/train_data', transform=source_transform)
target_dataset = ImageFolder('real_or_drawing/test_data', transform=target_transform)
source_dataloader = DataLoader(source_dataset, batch_size=32, shuffle=True)
target_dataloader = DataLoader(target_dataset, batch_size=32, shuffle=True)
test_dataloader = DataLoader(target_dataset, batch_size=128, shuffle=False)
代码
文本

Model

Feature Extractor: Classic VGG-like architecture

Label Predictor / Domain Classifier: Linear models.

代码
文本
[ ]
class FeatureExtractor(nn.Module):

def __init__(self):
super(FeatureExtractor, self).__init__()

self.conv = nn.Sequential(
nn.Conv2d(1, 64, 3, 1, 1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2),

nn.Conv2d(64, 128, 3, 1, 1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(2),

nn.Conv2d(128, 256, 3, 1, 1),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.MaxPool2d(2),

nn.Conv2d(256, 256, 3, 1, 1),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.MaxPool2d(2),

nn.Conv2d(256, 512, 3, 1, 1),
nn.BatchNorm2d(512),
nn.ReLU(),
nn.MaxPool2d(2)
)
def forward(self, x):
x = self.conv(x).squeeze()
return x

class LabelPredictor(nn.Module):

def __init__(self):
super(LabelPredictor, self).__init__()

self.layer = nn.Sequential(
nn.Linear(512, 512),
nn.ReLU(),

nn.Linear(512, 512),
nn.ReLU(),

nn.Linear(512, 10),
)

def forward(self, h):
c = self.layer(h)
return c

class DomainClassifier(nn.Module):

def __init__(self):
super(DomainClassifier, self).__init__()

self.layer = nn.Sequential(
nn.Linear(512, 512),
nn.BatchNorm1d(512),
nn.ReLU(),

nn.Linear(512, 512),
nn.BatchNorm1d(512),
nn.ReLU(),

nn.Linear(512, 512),
nn.BatchNorm1d(512),
nn.ReLU(),

nn.Linear(512, 512),
nn.BatchNorm1d(512),
nn.ReLU(),

nn.Linear(512, 1),
)

def forward(self, h):
y = self.layer(h)
return y
代码
文本

Pre-processing

Here we use Adam as our optimizor.

代码
文本
[ ]
feature_extractor = FeatureExtractor().cuda()
label_predictor = LabelPredictor().cuda()
domain_classifier = DomainClassifier().cuda()

class_criterion = nn.CrossEntropyLoss()
domain_criterion = nn.BCEWithLogitsLoss()

optimizer_F = optim.Adam(feature_extractor.parameters())
optimizer_C = optim.Adam(label_predictor.parameters())
optimizer_D = optim.Adam(domain_classifier.parameters())
代码
文本

Start Training

DaNN Implementation

In the original paper, Gradient Reversal Layer is used. Feature Extractor, Label Predictor, and Domain Classifier are all trained at the same time. In this code, we train Domain Classifier first, and then train our Feature Extractor (same concept as Generator and Discriminator training process in GAN).

Reminder

  • Lambda, which controls the domain adversarial loss, is adaptive in the original paper. You can refer to the original work . Here lambda is set to 0.1.
  • We do not have the label to target data, you can only evaluate your model by uploading your result to kaggle.:)
代码
文本
[ ]
def train_epoch(source_dataloader, target_dataloader, lamb):
'''
Args:
source_dataloader: source data的dataloader
target_dataloader: target data的dataloader
lamb: control the balance of domain adaptatoin and classification.
'''

# D loss: Domain Classifier的loss
# F loss: Feature Extrator & Label Predictor的loss
running_D_loss, running_F_loss = 0.0, 0.0
total_hit, total_num = 0.0, 0.0

for i, ((source_data, source_label), (target_data, _)) in enumerate(zip(source_dataloader, target_dataloader)):

source_data = source_data.cuda()
source_label = source_label.cuda()
target_data = target_data.cuda()
# Mixed the source data and target data, or it'll mislead the running params
# of batch_norm. (runnning mean/var of soucre and target data are different.)
mixed_data = torch.cat([source_data, target_data], dim=0)
domain_label = torch.zeros([source_data.shape[0] + target_data.shape[0], 1]).cuda()
# set domain label of source data to be 1.
domain_label[:source_data.shape[0]] = 1

# Step 1 : train domain classifier
feature = feature_extractor(mixed_data)
# We don't need to train feature extractor in step 1.
# Thus we detach the feature neuron to avoid backpropgation.
domain_logits = domain_classifier(feature.detach())
loss = domain_criterion(domain_logits, domain_label)
running_D_loss+= loss.item()
loss.backward()
optimizer_D.step()

# Step 2 : train feature extractor and label classifier
class_logits = label_predictor(feature[:source_data.shape[0]])
domain_logits = domain_classifier(feature)
# loss = cross entropy of classification - lamb * domain binary cross entropy.
# The reason why using subtraction is similar to generator loss in disciminator of GAN
loss = class_criterion(class_logits, source_label) - lamb * domain_criterion(domain_logits, domain_label)
running_F_loss+= loss.item()
loss.backward()
optimizer_F.step()
optimizer_C.step()

optimizer_D.zero_grad()
optimizer_F.zero_grad()
optimizer_C.zero_grad()

total_hit += torch.sum(torch.argmax(class_logits, dim=1) == source_label).item()
total_num += source_data.shape[0]
print(i, end='\r')

return running_D_loss / (i+1), running_F_loss / (i+1), total_hit / total_num

# train 200 epochs
for epoch in range(200):
train_D_loss, train_F_loss, train_acc = train_epoch(source_dataloader, target_dataloader, lamb=0.1)

torch.save(feature_extractor.state_dict(), f'extractor_model.bin')
torch.save(label_predictor.state_dict(), f'predictor_model.bin')

print('epoch {:>3d}: train D loss: {:6.4f}, train F loss: {:6.4f}, acc {:6.4f}'.format(epoch, train_D_loss, train_F_loss, train_acc))

代码
文本

Inference

We use pandas to generate our csv file.

BTW, the performance of the model trained for 200 epoches might be unstable. You can train for more epoches for a more stable performance.

代码
文本
[16]
result = []
label_predictor.eval()
feature_extractor.eval()
for i, (test_data, _) in enumerate(test_dataloader):
test_data = test_data.cuda()

class_logits = label_predictor(feature_extractor(test_data))

x = torch.argmax(class_logits, dim=1).cpu().detach().numpy()
result.append(x)

import pandas as pd
result = np.concatenate(result)

# Generate your submission
df = pd.DataFrame({'id': np.arange(0,len(result)), 'label': result})
df.to_csv('DaNN_submission.csv',index=False)
代码
文本

Visualization

We use t-SNE plot to observe the distribution of extracted features.

代码
文本
[17]
import numpy as np
import matplotlib.pyplot as plt
from sklearn import manifold
代码
文本

Step1: Load checkpoint and evaluate to get extracted features

代码
文本
[ ]
# Hints:
# Set features_extractor to eval mode
# Start evaluation and collect features and labels
代码
文本

Step2: Apply t-SNE and normalize

代码
文本
[ ]
# process extracted features with t-SNE
# X_tsne = manifold.TSNE(n_components=2, init='random', random_state=5, verbose=1).fit_transform(X)

# Normalization the processed features
# x_min, x_max = X_tsne.min(0), X_tsne.max(0)
# X_norm = (X_tsne - x_min) / (x_max - x_min)
代码
文本

Step3: Visualization with matplotlib

代码
文本
[ ]
# Data Visualization
# Use matplotlib to plot the distribution
# The shape of X_norm is (N,2)
代码
文本

Training Statistics

  • Number of parameters:

    • Feature Extractor: 2, 142, 336
    • Label Predictor: 530, 442
    • Domain Classifier: 1, 055, 233
  • Simple

  • Training time on colab: ~ 1 hr

  • Medium

  • Training time on colab: 2 ~ 4 hr

  • Strong

  • Training time on colab: 5 ~ 6 hrs

  • Boss

  • Unmeasurable

代码
文本

Learning Curve (Strong Baseline)

  • This method is slightly different from colab.

Loss Curve

Accuracy Curve (Strong Baseline)

  • Note that you cannot access testing accuracy. But this plot tells you that even though the model overfits the training data, the testing accuracy is still improving, and that's why you need to train more epochs.

Acc Curve

代码
文本

Q&A

If there is any problem related to Domain Adaptation, please email to b08901058@ntu.edu.tw / mlta-2022-spring@googlegroups.com

代码
文本
[ ]

代码
文本
Deep Learning
notebook
python
Deep Learningnotebookpython
点个赞吧
推荐阅读
公开
Homework 11 - Transfer Learning (Domain Adversarial Training)
Deep Learningnotebookpython
Deep Learningnotebookpython
donglikun@dp.tech
发布于 2024-04-28
3 转存文件
公开
Homework 11: Adaptation
Deep Learning
Deep Learning
ck
发布于 2024-03-18
1 转存文件