Original mock contests

Two original USAAIO-style mocks: a theory mini-mock (10 short-answer questions, 45 min) and a coding mini-mock (one end-to-end ML task on a synthetic dataset, 90 min). Use these between major Kaggle attempts.

How to take a mock. Strict timer. Closed editorial. After the timer, score honestly, log every miss, then re-attempt failed problems with no time limit.

Mock 1 · Theory mini-mock (10 questions, 45 min)

  1. Linear algebra. For matrix A = [[2, 1], [1, 2]], find both eigenvalues.
  2. Probability. Two fair dice are rolled. What is the probability that the sum is at least 10?
  3. Calculus. For f(x, y) = x² y + 3y, compute ∂f/∂y at (2, 1).
  4. Statistics. A sample has values 4, 8, 10, 14, 14. Compute the median and the (population) standard deviation.
  5. Numpy / ML. Given a 2-D array X of shape (100, 5), write one line of NumPy that standardizes each column (zero mean, unit variance).
  6. Classical ML. A model has 95% training accuracy and 70% validation accuracy. Is this overfitting or underfitting? Name two interventions.
  7. Loss functions. Why use BCEWithLogitsLoss instead of BCELoss applied to a sigmoid output?
  8. Deep learning. A Conv2d with in_channels=16, out_channels=32, kernel_size=5, no bias. How many parameters?
  9. Transformers. A self-attention layer has model dimension d_model = 512 and n_heads = 8. What is the dimension per head?
  10. Optimization. Name one situation where you'd prefer SGD with momentum over AdamW.

Answer key

Reveal answers
  1. Eigenvalues: λ = 1 and λ = 3. (Characteristic polynomial: (2 − λ)² − 1 = 0.)
  2. Outcomes with sum ≥ 10: (4,6), (5,5), (5,6), (6,4), (6,5), (6,6) → 6 out of 36 → 1/6.
  3. ∂f/∂y = x² + 3 = 4 + 3 = 7.
  4. Median = 10. Mean = 10. Squared deviations: 36, 4, 0, 16, 16 → sum 72 → variance 14.4 → SD ≈ 3.79.
  5. (X - X.mean(axis=0)) / X.std(axis=0).
  6. Overfitting. Interventions: more regularization (weight decay, dropout, smaller model); more training data; data augmentation; early stopping.
  7. Numerical stability: BCEWithLogitsLoss uses the log-sum-exp trick to handle very large or very small logits without intermediate overflow.
  8. 16 × 32 × 5 × 5 = 12 800 parameters.
  9. 512 / 8 = 64 dimensions per head.
  10. Large-scale vision training (e.g. ImageNet ResNet) with a well-tuned learning-rate schedule — SGD+momentum often generalizes slightly better than Adam in this regime.

Mock 2 · Coding mini-mock (90 min)

Task: "Synthetic species classification"

You are given a small tabular dataset with 8 numerical features and a 3-class label. The training set has 2 000 rows; the held-out test set has 500 rows. Your task is to produce a notebook that trains a model and outputs predictions for the test set, maximizing macro-F1.

Synthetic data generator (use this to create your dataset locally)

import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(
    n_samples=2500, n_features=8, n_informative=5, n_redundant=2,
    n_classes=3, class_sep=1.2, weights=[0.5, 0.3, 0.2], random_state=42,
)

feature_cols = [f"f{i}" for i in range(8)]
df = pd.DataFrame(X, columns=feature_cols)
df["label"] = y

train_df, test_df = train_test_split(df, test_size=500, random_state=0, stratify=df["label"])
train_df.to_csv("train.csv", index=False)
test_df.drop(columns=["label"]).to_csv("test.csv", index=False)
test_df[["label"]].to_csv("test_labels.csv", index=False)  # for your own scoring

Deliverables

Scoring rubric

Macro-F1 achievedScore (out of 100)
< 0.550
0.55 – 0.6540
0.65 – 0.7570
0.75 – 0.8290
≥ 0.82100

Reference baseline (open after your attempt)

Reveal reference baseline
import pandas as pd
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedKFold, cross_val_score

train = pd.read_csv("train.csv")
test  = pd.read_csv("test.csv")

X = train.drop(columns=["label"])
y = train["label"]

# scale (HGB doesn't need it, but it doesn't hurt)
scaler = StandardScaler().fit(X)
X_s = scaler.transform(X)
test_s = scaler.transform(test)

model = HistGradientBoostingClassifier(
    max_depth=6, learning_rate=0.05, max_iter=400, random_state=42,
)

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X_s, y, cv=cv, scoring="f1_macro")
print("CV macro-F1:", scores.mean(), "±", scores.std())

model.fit(X_s, y)
preds = model.predict(test_s)
pd.DataFrame({"label": preds}).to_csv("predictions.csv", index=False)

Reference baseline typically scores around macro-F1 ≈ 0.78–0.82 on this generator with the default seed. Beating it needs careful CV-driven hyperparameter tuning or a small ensemble (HGB + logistic regression + MLP averaged).

Tips during the mock

After each mock

  1. Score honestly using the rubric.
  2. For each missed theory item: re-derive without looking, then read the explanation.
  3. For the coding mock: identify your single biggest score-leak (was it feature engineering? model choice? CV?), and drill that next week.
  4. Log everything into the same error log you use for problem sets.