CNN
- Default stride is 1
- According to Wikipedia, the output height/width of a convolutional layer (per channel) is given by
$$\left\lfloor\frac{W-K+2P}{S}+1\right\rfloor$$where \(W\) is the input size, \(K\) is the filter size, \(P\) is the padding size and \(S\) is the stride.
Batch normalisation
- In FC layer, normalise per mini-batch
- In Conv layer, normalise per channel. Per sample normalisation can also be done, which is called layer normalisation.
- Batch normalisation (BN) layer is inserted after FC/Conv layer and before activation layer.
- Batch size becomes an important tunable parameter
Freezing layers
In addition to controlling param.requires_grad
, I need to enable eval()
mode corresponding to batchnorm layers
, if any. Details: https://discuss.pytorch.org/t/should-i-use-model-eval-when-i-freeze-batchnorm-layers-to-finetune/39495
PyTorch
len(dataloader)
: number of batcheslen(dataloader.dataset)
: number of samples