End-to-end Colab notebooks
Each stratum page on this site walks through the theory with short, focused snippets. These four notebooks do the opposite — they're complete pipelines from raw data to a working submission, the way USAAIO problems actually arrive. Open one in Colab, hit Run All, and you have a working baseline you can iterate on.
Runtime → Change runtime type → T4 GPU) before running. For notebooks 1 and 4 the
free CPU runtime is fine.
.ipynb files are committed to the repo so you can also
download them and run locally (jupyter lab) — no Colab login required.
The four notebooks
01 · Tabular ML on Titanic
Full Kaggle-style pipeline: load → EDA (correlation heatmap, survival bars) → feature engineering
(Title, FamilySize, FareBin) → ColumnTransformer preprocessing → LR baseline → 4-model comparison
(LogReg, RandomForest, GradientBoosting, Stacking) → 5-fold cross-validation → feature importance →
Kaggle submission.csv.
Runtime: ~2 min on Colab CPU. Pairs with Classical ML.
02 · CIFAR-10 with PyTorch
A small CNN (~200K params) trained end to end: data loaders, model definition, training loop with OneCycleLR, checkpoint save/load, then an augmentation upgrade (RandomCrop + Flip + RandomErasing / Cutout) and a confusion matrix on the test set.
Runtime: ~3–5 min on a T4/L4 GPU. Pairs with Deep Learning.
03 · Fine-tune DistilBERT on IMDB
Hugging Face transformers + datasets + evaluate. Load IMDB,
tokenize with the DistilBERT tokenizer, fine-tune for 2 epochs with the Trainer API,
report accuracy + F1, run inference on custom strings, and save the model to disk.
Runtime: ~8–12 min on a T4/L4 GPU. Pairs with Attention & transformers.
04 · Bayesian A/B test
Beta-Binomial conjugacy derived in LaTeX, then simulated: stream daily traffic, watch the posteriors tighten, compute P(B>A | data) by Monte Carlo, apply a 95% decision rule, and compare to the frequentist two-proportion z-test. Ends with three drill problems and worked solutions.
Runtime: ~30 s on Colab CPU. Pairs with Math you need.
Why end-to-end?
- Round 1 and Round 2 problems are notebooks. USAAIO ships a Colab template with data already loaded. You're expected to read it, design a pipeline, and submit predictions. Short snippets don't rehearse that loop.
- The glue is most of the work. Loading, splitting, preprocessing, saving — the boring parts are where contestants lose hours and points. Run these notebooks once and the muscle memory transfers.
- Stratum pages stay focused. The Math/Python/ML/DL/Transformers pages explain why; these notebooks show the full how in one place.