USAAIO 2025 Round 1 · Problem 4 · CNN on a small image set [reconstructed]
Contest: 2025 USA-NA-AIO Round 1 · Round: Round 1 (online) · Category: Deep learning / convolutional networks.
Official sources: usaaio.org/past-problems · 2025 Round 1 forum.
1. Problem restatement
Given a small labelled image dataset (likely a downsampled CIFAR-style 10-class set, ~5 000 training images), train a CNN from scratch. The problem walks through ~12 parts: load and visualise the data, build the model architecture in PyTorch, train for a few epochs, plot loss curves, evaluate, then discuss the effect of data augmentation, normalisation, and learning-rate scheduling. All on CPU / a small GPU.
2. What's being tested
- PyTorch literacy. Datasets, DataLoaders, nn.Module, optimisers.
- Conv arithmetic. Output sizes after stride/padding/kernel, parameter counts.
- Training-loop hygiene. Move tensors to device, zero_grad, backward, step.
- Generalisation reasoning. Why does data augmentation help when the dataset is small?
3. Data exploration / setup
import torch, torchvision
from torchvision import transforms as T
from torch.utils.data import DataLoader
tfm = T.Compose([T.ToTensor(),
T.Normalize((0.5,) * 3, (0.5,) * 3)])
train = torchvision.datasets.ImageFolder("data/train", transform=tfm)
val = torchvision.datasets.ImageFolder("data/val", transform=tfm)
print(len(train), len(val), train.classes)
tr = DataLoader(train, batch_size=128, shuffle=True, num_workers=2)
va = DataLoader(val, batch_size=256, shuffle=False)
4. Baseline approach
import torch.nn as nn, torch.nn.functional as F
class SmallCNN(nn.Module):
def __init__(self, nc=10):
super().__init__()
self.c1 = nn.Conv2d(3, 32, 3, padding=1)
self.c2 = nn.Conv2d(32, 64, 3, padding=1)
self.c3 = nn.Conv2d(64, 128, 3, padding=1)
self.fc = nn.Linear(128 * 4 * 4, nc)
def forward(self, x):
x = F.max_pool2d(F.relu(self.c1(x)), 2)
x = F.max_pool2d(F.relu(self.c2(x)), 2)
x = F.max_pool2d(F.relu(self.c3(x)), 2)
return self.fc(x.flatten(1))
m = SmallCNN().cuda() if torch.cuda.is_available() else SmallCNN()
opt = torch.optim.AdamW(m.parameters(), lr=3e-3, weight_decay=1e-4)
for epoch in range(10):
m.train()
for x, y in tr:
x, y = x.to(m.fc.weight.device), y.to(m.fc.weight.device)
opt.zero_grad()
loss = F.cross_entropy(m(x), y)
loss.backward(); opt.step()
Baseline accuracy: ~60–70% on a CIFAR-10-style problem at 32×32. [illustrative]
5. Improvements that move the needle
5.1 · Data augmentation
Random crop + horizontal flip is a 5–10 point reliable lift. Add
T.RandomCrop(32, padding=4) and T.RandomHorizontalFlip() to the train
transform only.
5.2 · Batch normalisation between conv and ReLU
Insert nn.BatchNorm2d after every Conv2d. Stabilises gradients, allows a higher
learning rate, +3–5 points.
5.3 · Cosine LR schedule + warmup
Start lr at 1e-4, ramp to 3e-3 over 1 epoch, then cosine decay to 0 over remaining epochs. Schedulers do most of the work tuners imagine they do.
5.4 · Test-time augmentation
Average predictions across the original and the horizontally flipped test image. Free +1 point.
5.5 · Explicit parameter-count and FLOP report
Round-1 problems often allocate points for "describe your model's complexity". Print parameter count and a rough FLOP estimate; mention it in the write-up.
6. Submission format & gotchas
- Post code + final accuracy + training-loss plot for each part on the forum.
- Seed PyTorch (
torch.manual_seed(0)) and CUDA (torch.cuda.manual_seed_all(0)) for reproducibility. - Don't accidentally train on the val set; use ImageFolder splits the problem provides.
- If running on CPU, batch-size and epoch counts must be smaller — note this in the write-up.
7. What top solutions did
[reconstructed — verify against published solutions] The expected full-marks recipe for a CIFAR-style P4: small CNN with BatchNorm, augmentations (random crop + flip), cosine LR with warmup, weight decay 1e-4, ~30 epochs. End on ~85% val accuracy. A more ambitious team would add Mixup or RandAugment for another 2 points. ResNet-18 is overkill at this dataset size and was rarely the winning move on similar problems.
8. Drill
D · Your training loss is decreasing but val accuracy plateaus at 50%. What's the first thing to try?
Add data augmentation. A plateau with continuing training-loss decrease is textbook overfitting. Random crop + horizontal flip alone often closes most of the gap; if that's already enabled, reach for stronger augmentation (Mixup, CutMix, RandAugment) and increase weight decay. Reducing model capacity is usually the wrong move — a slightly bigger model with stronger augmentation generalises better than a smaller model with none.
D2 · Compute the output spatial size of a 3×3 conv with stride 1, padding 1, on a 32×32 input.
Output size = floor((H + 2·pad − kernel) / stride) + 1 = floor((32 + 2 − 3) / 1) + 1 = 32. So padding=1 with kernel=3 stride=1 preserves spatial dimensions exactly. That's why "same padding" is the default in most CNNs. Knowing this formula cold saves 10 minutes of debugging when your flatten dimension doesn't match.