PyTorch Mental Map

PyTorch code usually separates into two big zones: model specification, which defines the computation, and training setup, which measures error and updates trainable weights.

Big Picture

Wrappersorganize layers
Layerstransform data
Activationsadd nonlinearity
Normalizationstabilize training
Dropoutregularize
Lossesmeasure error
Optimizersupdate weights

Tiny Examples

model = nn.Sequential(          # wrapper: organizes layers
    nn.Linear(16, 32),          # layer: transforms data
    nn.ReLU(),                  # activation: adds nonlinearity
    nn.Dropout(0.1),            # dropout: regularizes
    nn.Linear(32, 1),           # layer: final transform
)

loss_fn = nn.BCEWithLogitsLoss()
loss = loss_fn(logits, labels)  # loss: measures error

optimizer = torch.optim.AdamW(
    model.parameters(),
    lr=1e-4,
)
optimizer.step()                # optimizer: updates weights
  • Wrappernn.Sequential chains modules in order.
  • Layersnn.Linear changes vector dimensions.
  • Activationnn.ReLU lets the model learn nonlinear patterns.
  • Dropoutnn.Dropout randomly drops values during training.
  • Lossloss_fn(logits, labels) computes how wrong predictions are.
  • Optimizeroptimizer.step() applies the weight update.

Per-Epoch Training Loop

for epoch in range(num_epochs):
    model.train()                 # put model in training mode

    for xb, yb in loader:          # get one mini-batch
        optimizer.zero_grad()      # clear old gradients

        logits = model(xb)         # forward pass
        loss = loss_fn(logits, yb) # measure error

        loss.backward()            # compute gradients
        optimizer.step()           # update trainable weights

        total_loss += loss.item() * len(xb)
  • Epochone full pass over the training data.
  • Batcha small chunk of samples from the loader.
  • Forwardmodel(xb) runs the architecture.
  • Lossloss_fn(...) turns predictions into an error value.
  • Backwardloss.backward() computes gradients.
  • Stepoptimizer.step() updates trainable weights.

Model Spec / Architecture

Wrappers

organize layers
  • nn.Sequential
  • nn.ModuleList
  • nn.ModuleDict
  • nn.ParameterList
  • nn.ParameterDict

Layers

transform data
  • nn.Linear
  • nn.Conv1d
  • nn.Conv2d
  • nn.Conv3d
  • nn.Embedding
  • nn.Bilinear

Activations

add nonlinearity
  • nn.ReLU
  • nn.LeakyReLU
  • nn.GELU
  • nn.SiLU
  • nn.Tanh
  • nn.Sigmoid
  • nn.Softmax

Normalization

stabilize training
  • nn.BatchNorm1d
  • nn.BatchNorm2d
  • nn.LayerNorm
  • nn.GroupNorm
  • nn.InstanceNorm1d
  • nn.InstanceNorm2d

Dropout

regularize
  • nn.Dropout
  • nn.Dropout1d
  • nn.Dropout2d
  • nn.Dropout3d
  • nn.AlphaDropout

Training Setup

Losses

measure error
  • nn.MSELoss
  • nn.L1Loss
  • nn.CrossEntropyLoss
  • nn.BCEWithLogitsLoss
  • nn.BCELoss
  • nn.NLLLoss
  • nn.KLDivLoss
  • nn.MarginRankingLoss
  • nn.TripletMarginLoss

Optimizers

update weights
  • torch.optim.SGD
  • torch.optim.Adam
  • torch.optim.AdamW
  • torch.optim.RMSprop
  • torch.optim.Adagrad
  • torch.optim.Adadelta
  • torch.optim.Adamax
  • torch.optim.NAdam
  • torch.optim.RAdam
  • torch.optim.LBFGS