End-to-end Colab notebooks

Each stratum page on this site walks through the theory with short, focused snippets. These four notebooks do the opposite — they're complete pipelines from raw data to a working submission, the way USAAIO problems actually arrive. Open one in Colab, hit Run All, and you have a working baseline you can iterate on.

How to run. Click Open in Colab. For notebooks 2 and 3, switch the runtime to GPU (Runtime → Change runtime type → T4 GPU) before running. For notebooks 1 and 4 the free CPU runtime is fine.

Download. The .ipynb files are committed to the repo so you can also download them and run locally (jupyter lab) — no Colab login required.

The four notebooks

Classical ML

01 · Tabular ML on Titanic

Full Kaggle-style pipeline: load → EDA (correlation heatmap, survival bars) → feature engineering (Title, FamilySize, FareBin) → ColumnTransformer preprocessing → LR baseline → 4-model comparison (LogReg, RandomForest, GradientBoosting, Stacking) → 5-fold cross-validation → feature importance → Kaggle submission.csv.

Runtime: ~2 min on Colab CPU. Pairs with Classical ML.

Open in Colab Download .ipynb

Deep Learning

02 · CIFAR-10 with PyTorch

A small CNN (~200K params) trained end to end: data loaders, model definition, training loop with OneCycleLR, checkpoint save/load, then an augmentation upgrade (RandomCrop + Flip + RandomErasing / Cutout) and a confusion matrix on the test set.

Runtime: ~3–5 min on a T4/L4 GPU. Pairs with Deep Learning.

Open in Colab Download .ipynb

Transformers

03 · Fine-tune DistilBERT on IMDB

Hugging Face transformers + datasets + evaluate. Load IMDB, tokenize with the DistilBERT tokenizer, fine-tune for 2 epochs with the Trainer API, report accuracy + F1, run inference on custom strings, and save the model to disk.

Runtime: ~8–12 min on a T4/L4 GPU. Pairs with Attention & transformers.

Open in Colab Download .ipynb

Math · Probability

04 · Bayesian A/B test

Beta-Binomial conjugacy derived in LaTeX, then simulated: stream daily traffic, watch the posteriors tighten, compute P(B>A | data) by Monte Carlo, apply a 95% decision rule, and compare to the frequentist two-proportion z-test. Ends with three drill problems and worked solutions.

Runtime: ~30 s on Colab CPU. Pairs with Math you need.

Open in Colab Download .ipynb

Why end-to-end?

Round 1 and Round 2 problems are notebooks. USAAIO ships a Colab template with data already loaded. You're expected to read it, design a pipeline, and submit predictions. Short snippets don't rehearse that loop.
The glue is most of the work. Loading, splitting, preprocessing, saving — the boring parts are where contestants lose hours and points. Run these notebooks once and the muscle memory transfers.
Stratum pages stay focused. The Math/Python/ML/DL/Transformers pages explain why; these notebooks show the full how in one place.