The math you actually need

Four areas: linear algebra, probability & statistics, multivariable calculus, convex optimization. You don't need a full undergraduate sequence — you need the parts that make gradient descent, PCA, and the bias / variance tradeoff legible.

How deep to go. Aim for fluency, not proofs. If you can re-derive backpropagation on a 2-layer MLP on paper and explain why PCA picks the top eigenvectors, you're past the bar.

Linear algebra

Key identity for ML: for matrix X with rows = data points, the covariance matrix is C = (1/n) Xᵀ X (after centering). Its top eigenvectors are the principal components.

Probability & statistics

Multivariable calculus

Gradient descent: x ← x − η · ∇f(x). Reduce η if the loss oscillates; increase η if it crawls.

Convex optimization

  1. Implement matrix multiplication, transpose, and dot product from scratch in NumPy without using built-ins. Verify against np.matmul.
  2. Compute PCA by hand on a 5-point 2-D dataset: center, covariance, eigendecomposition, project.
  3. Derive the gradient of mean-squared-error loss for linear regression. Verify with PyTorch autograd.
  4. For a 2-layer MLP with one hidden ReLU and a sigmoid output, write out the four partial derivatives needed for backprop on a single sample.
  5. Run gradient descent on a 1-D convex function (e.g. f(x) = (x − 3)²) and plot the trajectory for three learning rates.