PyTorch Mental Map

PyTorch code usually separates into two big zones: model specification, which defines the computation, and training setup, which measures error and updates trainable weights.

Big Picture

Wrappersorganize layers

Layerstransform data

Activationsadd nonlinearity

Normalizationstabilize training

Dropoutregularize

Lossesmeasure error

Optimizersupdate weights

Tiny Examples

model = nn.Sequential(          # wrapper: organizes layers
    nn.Linear(16, 32),          # layer: transforms data
    nn.ReLU(),                  # activation: adds nonlinearity
    nn.Dropout(0.1),            # dropout: regularizes
    nn.Linear(32, 1),           # layer: final transform
)

loss_fn = nn.BCEWithLogitsLoss()
loss = loss_fn(logits, labels)  # loss: measures error

optimizer = torch.optim.AdamW(
    model.parameters(),
    lr=1e-4,
)
optimizer.step()                # optimizer: updates weights

Wrappernn.Sequential chains modules in order.
Layersnn.Linear changes vector dimensions.
Activationnn.ReLU lets the model learn nonlinear patterns.
Dropoutnn.Dropout randomly drops values during training.
Lossloss_fn(logits, labels) computes how wrong predictions are.
Optimizeroptimizer.step() applies the weight update.

Per-Epoch Training Loop

for epoch in range(num_epochs):
    model.train()                 # put model in training mode

    for xb, yb in loader:          # get one mini-batch
        optimizer.zero_grad()      # clear old gradients

        logits = model(xb)         # forward pass
        loss = loss_fn(logits, yb) # measure error

        loss.backward()            # compute gradients
        optimizer.step()           # update trainable weights

        total_loss += loss.item() * len(xb)

Epochone full pass over the training data.
Batcha small chunk of samples from the loader.
Forwardmodel(xb) runs the architecture.
Lossloss_fn(...) turns predictions into an error value.
Backwardloss.backward() computes gradients.
Stepoptimizer.step() updates trainable weights.

Model Spec / Architecture

Wrappers

organize layers

nn.Sequential
nn.ModuleList
nn.ModuleDict
nn.ParameterList
nn.ParameterDict

Layers

transform data

nn.Linear
nn.Conv1d
nn.Conv2d
nn.Conv3d
nn.Embedding
nn.Bilinear

Activations

add nonlinearity

nn.ReLU
nn.LeakyReLU
nn.GELU
nn.SiLU
nn.Tanh
nn.Sigmoid
nn.Softmax

Normalization

stabilize training

nn.BatchNorm1d
nn.BatchNorm2d
nn.LayerNorm
nn.GroupNorm
nn.InstanceNorm1d
nn.InstanceNorm2d

Dropout

regularize

nn.Dropout
nn.Dropout1d
nn.Dropout2d
nn.Dropout3d
nn.AlphaDropout

Training Setup

Losses

measure error

nn.MSELoss
nn.L1Loss
nn.CrossEntropyLoss
nn.BCEWithLogitsLoss
nn.BCELoss
nn.NLLLoss
nn.KLDivLoss
nn.MarginRankingLoss
nn.TripletMarginLoss

Optimizers

update weights

torch.optim.SGD
torch.optim.Adam
torch.optim.AdamW
torch.optim.RMSprop
torch.optim.Adagrad
torch.optim.Adadelta
torch.optim.Adamax
torch.optim.NAdam
torch.optim.RAdam
torch.optim.LBFGS