Homework 1: COVID-19 Cases Prediction (Regression)
Author: Heng-Jui Chang
Slides: https://github.com/ga642381/ML2021-Spring/blob/main/HW01/HW01.pdf
Video: TBA
Objectives:
- Solve a regression problem with deep neural networks (DNN).
- Understand basic DNN training tips.
- Get familiar with PyTorch.
If any questions, please contact the TAs via TA hours, NTU COOL, or email.
Download Data
If the Google drive links are dead, you can download data from kaggle, and upload data manually to the workspace.
/opt/conda/lib/python3.8/site-packages/gdown/cli.py:127: FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID. warnings.warn( Downloading... From: https://drive.google.com/uc?id=19CCyCgJrUxtvgZF53vnctJiOJ23T5mqF To: /covid.train.csv 100%|██████████████████████████████████████| 2.00M/2.00M [00:00<00:00, 6.39MB/s] /opt/conda/lib/python3.8/site-packages/gdown/cli.py:127: FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID. warnings.warn( Downloading... From: https://drive.google.com/uc?id=1CE240jLm2npU-tdz81-oVKEF3T2yfT1O To: /covid.test.csv 100%|████████████████████████████████████████| 651k/651k [00:00<00:00, 9.27MB/s]
Import Some Packages
Some Utilities
You do not need to modify this part.
Preprocess
We have three kinds of datasets:
train
: for trainingdev
: for validationtest
: for testing (w/o target value)
Dataset
The COVID19Dataset
below does:
- read
.csv
files - extract features
- split
covid.train.csv
into train/dev sets - normalize features
Finishing TODO
below might make you pass medium baseline.
DataLoader
A DataLoader
loads data from a given Dataset
into batches.
Deep Neural Network
NeuralNet
is an nn.Module
designed for regression.
The DNN consists of 2 fully-connected layers with ReLU activation.
This module also included a function cal_loss
for calculating loss.
Train/Dev/Test
Training
Validation
Testing
Setup Hyper-parameters
config
contains hyper-parameters for training and the path to save your model.
Load data and model
Finished reading the train set of COVID19 Dataset (2430 samples found, each dim = 93) Finished reading the dev set of COVID19 Dataset (270 samples found, each dim = 93) Finished reading the test set of COVID19 Dataset (893 samples found, each dim = 93)
Start Training!
Saving model (epoch = 1, loss = 78.8524) Saving model (epoch = 2, loss = 37.6170) Saving model (epoch = 3, loss = 26.1203) Saving model (epoch = 4, loss = 16.1862) Saving model (epoch = 5, loss = 9.7153) Saving model (epoch = 6, loss = 6.3701) Saving model (epoch = 7, loss = 5.1802) Saving model (epoch = 8, loss = 4.4255) Saving model (epoch = 9, loss = 3.8009) Saving model (epoch = 10, loss = 3.3691) Saving model (epoch = 11, loss = 3.0943) Saving model (epoch = 12, loss = 2.8176) Saving model (epoch = 13, loss = 2.6274) Saving model (epoch = 14, loss = 2.4542) Saving model (epoch = 15, loss = 2.3012) Saving model (epoch = 16, loss = 2.1766) Saving model (epoch = 17, loss = 2.0641) Saving model (epoch = 18, loss = 1.9399) Saving model (epoch = 19, loss = 1.8978) Saving model (epoch = 20, loss = 1.7950) Saving model (epoch = 21, loss = 1.7164) Saving model (epoch = 22, loss = 1.6455) Saving model (epoch = 23, loss = 1.5912) Saving model (epoch = 24, loss = 1.5599) Saving model (epoch = 25, loss = 1.5197) Saving model (epoch = 26, loss = 1.4698) Saving model (epoch = 27, loss = 1.4189) Saving model (epoch = 28, loss = 1.3992) Saving model (epoch = 29, loss = 1.3696) Saving model (epoch = 30, loss = 1.3442) Saving model (epoch = 31, loss = 1.3231) Saving model (epoch = 32, loss = 1.2834) Saving model (epoch = 33, loss = 1.2804) Saving model (epoch = 34, loss = 1.2471) Saving model (epoch = 36, loss = 1.2414) Saving model (epoch = 37, loss = 1.2138) Saving model (epoch = 38, loss = 1.2083) Saving model (epoch = 41, loss = 1.1591) Saving model (epoch = 42, loss = 1.1484) Saving model (epoch = 44, loss = 1.1209) Saving model (epoch = 47, loss = 1.1122) Saving model (epoch = 48, loss = 1.0937) Saving model (epoch = 50, loss = 1.0842) Saving model (epoch = 53, loss = 1.0654) Saving model (epoch = 54, loss = 1.0613) Saving model (epoch = 57, loss = 1.0525) Saving model (epoch = 58, loss = 1.0395) Saving model (epoch = 60, loss = 1.0265) Saving model (epoch = 63, loss = 1.0248) Saving model (epoch = 66, loss = 1.0098) Saving model (epoch = 70, loss = 0.9828) Saving model (epoch = 72, loss = 0.9813) Saving model (epoch = 73, loss = 0.9740) Saving model (epoch = 75, loss = 0.9672) Saving model (epoch = 78, loss = 0.9642) Saving model (epoch = 79, loss = 0.9594) Saving model (epoch = 85, loss = 0.9544) Saving model (epoch = 86, loss = 0.9528) Saving model (epoch = 90, loss = 0.9464) Saving model (epoch = 92, loss = 0.9432) Saving model (epoch = 93, loss = 0.9230) Saving model (epoch = 95, loss = 0.9126) Saving model (epoch = 104, loss = 0.9117) Saving model (epoch = 107, loss = 0.8998) Saving model (epoch = 110, loss = 0.8940) Saving model (epoch = 116, loss = 0.8890) Saving model (epoch = 124, loss = 0.8874) Saving model (epoch = 128, loss = 0.8729) Saving model (epoch = 134, loss = 0.8728) Saving model (epoch = 139, loss = 0.8680) Saving model (epoch = 146, loss = 0.8656) Saving model (epoch = 156, loss = 0.8642) Saving model (epoch = 159, loss = 0.8532) Saving model (epoch = 167, loss = 0.8502) Saving model (epoch = 173, loss = 0.8490) Saving model (epoch = 176, loss = 0.8462) Saving model (epoch = 178, loss = 0.8411) Saving model (epoch = 182, loss = 0.8376) Saving model (epoch = 199, loss = 0.8302) Saving model (epoch = 202, loss = 0.8300) Saving model (epoch = 212, loss = 0.8278) Saving model (epoch = 235, loss = 0.8254) Saving model (epoch = 238, loss = 0.8238) Saving model (epoch = 251, loss = 0.8207) Saving model (epoch = 253, loss = 0.8202) Saving model (epoch = 258, loss = 0.8177) Saving model (epoch = 284, loss = 0.8142) Saving model (epoch = 308, loss = 0.8138) Saving model (epoch = 312, loss = 0.8080) Saving model (epoch = 324, loss = 0.8046) Saving model (epoch = 400, loss = 0.8040) Saving model (epoch = 404, loss = 0.8011) Saving model (epoch = 466, loss = 0.8002) Saving model (epoch = 472, loss = 0.8002) Saving model (epoch = 525, loss = 0.7995) Saving model (epoch = 561, loss = 0.7954) Saving model (epoch = 584, loss = 0.7905) Saving model (epoch = 667, loss = 0.7890) Saving model (epoch = 717, loss = 0.7817) Saving model (epoch = 776, loss = 0.7812) Saving model (epoch = 835, loss = 0.7806) Saving model (epoch = 866, loss = 0.7775) Saving model (epoch = 919, loss = 0.7770) Saving model (epoch = 933, loss = 0.7748) Saving model (epoch = 965, loss = 0.7704) Saving model (epoch = 1027, loss = 0.7671) Saving model (epoch = 1119, loss = 0.7659) Saving model (epoch = 1140, loss = 0.7654) Saving model (epoch = 1196, loss = 0.7622) Saving model (epoch = 1234, loss = 0.7611) Saving model (epoch = 1243, loss = 0.7580) Saving model (epoch = 1323, loss = 0.7571) Finished training after 1524 epochs
Testing
The predictions of your model on testing set will be stored at pred.csv
.
Saving results to pred.csv
Hints
Simple Baseline
- Run sample code
Medium Baseline
- Feature selection: 40 states + 2
tested_positive
(TODO
in dataset)
Strong Baseline
- Feature selection (what other features are useful?)
- DNN architecture (layers? dimension? activation function?)
- Training (mini-batch? optimizer? learning rate?)
- L2 regularization
- There are some mistakes in the sample code, can you find them?
Reference
This code is completely written by Heng-Jui Chang @ NTUEE.
Copying or reusing this code is required to specify the original author.
E.g.
Source: Heng-Jui Chang @ NTUEE (https://github.com/ga642381/ML2021-Spring/blob/main/HW01/HW01.ipynb)