Original mock contests
Two original USAAIO-style mocks: a theory mini-mock (10 short-answer questions, 45 min) and a coding mini-mock (one end-to-end ML task on a synthetic dataset, 90 min). Use these between major Kaggle attempts.
How to take a mock. Strict timer. Closed editorial. After the timer, score honestly, log every miss, then re-attempt failed problems with no time limit.
Mock 1 · Theory mini-mock (10 questions, 45 min)
- Linear algebra. For matrix
A = [[2, 1], [1, 2]], find both eigenvalues. - Probability. Two fair dice are rolled. What is the probability that the sum is at least 10?
- Calculus. For
f(x, y) = x² y + 3y, compute∂f/∂yat(2, 1). - Statistics. A sample has values 4, 8, 10, 14, 14. Compute the median and the (population) standard deviation.
- Numpy / ML. Given a 2-D array
Xof shape (100, 5), write one line of NumPy that standardizes each column (zero mean, unit variance). - Classical ML. A model has 95% training accuracy and 70% validation accuracy. Is this overfitting or underfitting? Name two interventions.
- Loss functions. Why use
BCEWithLogitsLossinstead ofBCELossapplied to a sigmoid output? - Deep learning. A Conv2d with
in_channels=16,out_channels=32,kernel_size=5, no bias. How many parameters? - Transformers. A self-attention layer has model dimension
d_model = 512andn_heads = 8. What is the dimension per head? - Optimization. Name one situation where you'd prefer SGD with momentum over AdamW.
Answer key
Reveal answers
- Eigenvalues: λ = 1 and λ = 3. (Characteristic polynomial:
(2 − λ)² − 1 = 0.) - Outcomes with sum ≥ 10: (4,6), (5,5), (5,6), (6,4), (6,5), (6,6) → 6 out of 36 →
1/6. ∂f/∂y = x² + 3 = 4 + 3 = 7.- Median = 10. Mean = 10. Squared deviations: 36, 4, 0, 16, 16 → sum 72 → variance 14.4 → SD ≈ 3.79.
(X - X.mean(axis=0)) / X.std(axis=0).- Overfitting. Interventions: more regularization (weight decay, dropout, smaller model); more training data; data augmentation; early stopping.
- Numerical stability:
BCEWithLogitsLossuses the log-sum-exp trick to handle very large or very small logits without intermediate overflow. 16 × 32 × 5 × 5 = 12 800parameters.512 / 8 = 64dimensions per head.- Large-scale vision training (e.g. ImageNet ResNet) with a well-tuned learning-rate schedule — SGD+momentum often generalizes slightly better than Adam in this regime.
Mock 2 · Coding mini-mock (90 min)
Task: "Synthetic species classification"
You are given a small tabular dataset with 8 numerical features and a 3-class label. The training set has 2 000 rows; the held-out test set has 500 rows. Your task is to produce a notebook that trains a model and outputs predictions for the test set, maximizing macro-F1.
Synthetic data generator (use this to create your dataset locally)
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(
n_samples=2500, n_features=8, n_informative=5, n_redundant=2,
n_classes=3, class_sep=1.2, weights=[0.5, 0.3, 0.2], random_state=42,
)
feature_cols = [f"f{i}" for i in range(8)]
df = pd.DataFrame(X, columns=feature_cols)
df["label"] = y
train_df, test_df = train_test_split(df, test_size=500, random_state=0, stratify=df["label"])
train_df.to_csv("train.csv", index=False)
test_df.drop(columns=["label"]).to_csv("test.csv", index=False)
test_df[["label"]].to_csv("test_labels.csv", index=False) # for your own scoring
Deliverables
- A single notebook
solution.ipynbthat runs end-to-end. - A
predictions.csvwith one prediction per row of the test set (header:label). - A 3-sentence write-up of your approach.
Scoring rubric
| Macro-F1 achieved | Score (out of 100) |
|---|---|
| < 0.55 | 0 |
| 0.55 – 0.65 | 40 |
| 0.65 – 0.75 | 70 |
| 0.75 – 0.82 | 90 |
| ≥ 0.82 | 100 |
Reference baseline (open after your attempt)
Reveal reference baseline
import pandas as pd
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedKFold, cross_val_score
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")
X = train.drop(columns=["label"])
y = train["label"]
# scale (HGB doesn't need it, but it doesn't hurt)
scaler = StandardScaler().fit(X)
X_s = scaler.transform(X)
test_s = scaler.transform(test)
model = HistGradientBoostingClassifier(
max_depth=6, learning_rate=0.05, max_iter=400, random_state=42,
)
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X_s, y, cv=cv, scoring="f1_macro")
print("CV macro-F1:", scores.mean(), "±", scores.std())
model.fit(X_s, y)
preds = model.predict(test_s)
pd.DataFrame({"label": preds}).to_csv("predictions.csv", index=False)
Reference baseline typically scores around macro-F1 ≈ 0.78–0.82 on this generator with the default seed. Beating it needs careful CV-driven hyperparameter tuning or a small ensemble (HGB + logistic regression + MLP averaged).
Tips during the mock
- Build a baseline first. A 5-line logistic regression takes 60 seconds and grounds your CV scores.
- Trust cross-validation, not the single train/val split. Use 5-fold stratified.
- Track the seed. Reproducibility is part of the grade — pin
random_stateon every model and splitter. - Last 10 minutes: stop tuning, write the write-up, save the notebook, verify it runs from a clean kernel.
After each mock
- Score honestly using the rubric.
- For each missed theory item: re-derive without looking, then read the explanation.
- For the coding mock: identify your single biggest score-leak (was it feature engineering? model choice? CV?), and drill that next week.
- Log everything into the same error log you use for problem sets.