Personal note on deep learning

Published: Sun 19 November 2023


  • Default stride is 1
  • According to Wikipedia, the output height/width of a convolutional layer (per channel) is given by
    where \(W\) is the input size, \(K\) is the filter size, \(P\) is the padding size and \(S\) is the stride.

Batch normalisation

  • In FC layer, normalise per mini-batch
  • In Conv layer, normalise per channel. Per sample normalisation can also be done, which is called layer normalisation.
  • Batch normalisation (BN) layer is inserted after FC/Conv layer and before activation layer.
  • Batch size becomes an important tunable parameter

Freezing layers

In addition to controlling param.requires_grad, I need to enable eval() mode corresponding to batchnorm layers, if any. Details:


  • len(dataloader): number of batches
  • len(dataloader.dataset): number of samples

find me on