2016 · Problem A — Swim, Bike, and Run

Scheduling Queueing Clustering Simulation

The prompt, restated

A municipal race director is putting on an Olympic-distance triathlon (1.5 km swim, 40 km bike, 10 km run) for approximately 2,000 athletes. The race uses one swim course, one bike loop, and one run loop, and the host city will only close roads for a maximum of 5.5 hours — meaning the last athlete must cross the finish line before that deadline. Mass-starting 2,000 swimmers at once is unsafe (collisions in the water) and produces brutal congestion at the bike mount and on the run course; starting them too slowly stretches the race past the road-closure window. The team is asked to design a wave-start schedule: how many waves, who goes in each wave, and at what start times.

The problem gives a dataset of finish times and splits from a recent triathlon (provided in the prompt) so teams can fit realistic distributions of swim, bike, and run paces. The team must (1) propose a rule for partitioning athletes into divisions — by age, sex, or estimated ability — and justify it from the data; (2) construct a wave-start schedule that minimizes course congestion while keeping total event time under 5.5 hours; (3) explore whether changing the leg distances (e.g., a sprint format) would let more athletes finish safely; and (4) write a recommendation to the race director, including a printable start list with wave numbers and start times.

Key modeling idea

Two coupled subproblems. First, athletes $\to$ divisions is a clustering / segmentation problem on the historical results: estimate each athlete's expected total time $T_i$ and group athletes so within-wave pace variance is small (faster swimmers together avoid being run over). Second, divisions $\to$ schedule is a flow / packing problem: each wave injects a pulse of athletes into the swim, which then propagates through the bike and run as a moving "density wave" that must never exceed the safe density on any course segment.

Suggested approach

Step 1 — Fit the pace distributions. From the provided race data, fit per-leg pace distributions (lognormal works well for swim/run, slightly heavier-tailed for bike). Use age and sex as covariates (technique 3). A reasonable fit: swim mean pace ~2:00 min / 100 m with $\sigma \approx 0:20$; bike mean ~28 km/h; run mean ~5:30 min/km [illustrative].
Step 2 — Cluster into waves. Sort athletes by expected swim pace (the swim is the bottleneck for congestion), then bin into $W$ waves of ~80–120 athletes each. Target ~20 waves of 100 for a 2,000-athlete field (technique 6). Same-sex / same-age waves are easier to officiate; mixed-ability waves are unsafe in the swim.
Step 3 — Pick the inter-wave gap. The gap $\Delta$ must be long enough that wave $k+1$ does not catch wave $k$ at the slow tail of the swim, but short enough that the last wave can still finish in time. A good starting heuristic: $\Delta \approx 3$ min, giving total start window $W \cdot \Delta \approx 60$ min and leaving ~4.5 h for the slowest athlete to complete the course (matches an Olympic-distance back-of-pack pace).
Step 4 — Simulate course density. Discrete-event sim of each athlete through swim, T1, bike, T2, run (technique 11). For each 100-m segment of bike and run, count athletes present in a 1-minute window; flag any segment that exceeds ~30 athletes / 100 m as a congestion hotspot. Iterate on $\Delta$ and wave assignment until no hotspot appears.
Step 5 — Sensitivity & alternate formats. Re-run with a sprint distance (750 m / 20 km / 5 km) and a half-Olympic to show the time/safety trade. Add a 10% no-show rate and a 2% DNF rate per leg.

Data sources to consider

Source	What you get
USA Triathlon (usatriathlon.org) race results archive	Per-athlete splits for thousands of Olympic-distance events — fits pace distributions
Ironman 70.3 / Olympic-distance results pages	Wave-start times and finishers per wave (validates your $\Delta$)
ITU (World Triathlon) age-group results	Elite + age-group reference paces by 5-year age band
USA Triathlon Event Sanctioning Manual	Course-density safety guidelines (swimmers per lane, cyclists per km)
Municipal road-closure permit templates	Realistic closure-window negotiations and constraints

Common pitfalls

Treating all athletes as identical. A single average pace ignores the fat tail of slow finishers — they determine the 5.5-h constraint.
Ignoring the transitions. T1 and T2 each add 1–4 minutes per athlete and are a real bottleneck if rack space is limited. Budget them.
Waves that are too big. Above ~150 swimmers in a wave you get a "washing machine" effect and DNF rates spike; the literature backs ~80–120.
No congestion metric. Teams report "fewer collisions" without ever defining a number. Use athletes-per-100-m-per-minute, or athletes per lane in the swim.
Forgetting the road-closure constraint is per-leg. Bike roads may reopen before the run roads close — model each segment's open window separately.

Python sketch

Generate a synthetic field, cluster into waves by swim pace, and simulate finish times under a fixed inter-wave gap.

import numpy as np
rng = np.random.default_rng(2016)

# --- Step 1: synthesize a 2,000-athlete field (would normally come from the data) ---
N = 2000
swim_pace = rng.lognormal(mean=np.log(120), sigma=0.18, size=N)   # sec / 100 m
bike_kmh  = rng.normal(loc=28, scale=4.0, size=N).clip(15, 45)    # km / h
run_pace  = rng.lognormal(mean=np.log(330), sigma=0.15, size=N)   # sec / km

# expected total time (sec): swim 1500 m + T1 + bike 40 km + T2 + run 10 km
T = swim_pace * 15 + 120 + (40 / bike_kmh) * 3600 + 90 + run_pace * 10

# --- Step 2: sort by swim pace, bin into W waves ---
W, wave_size = 20, N // 20
order = np.argsort(swim_pace)
wave_id = np.empty(N, dtype=int)
for w in range(W):
    wave_id[order[w*wave_size:(w+1)*wave_size]] = w

# --- Step 3: assign start time per wave (Delta = 3 min) ---
Delta = 180  # sec
start = wave_id * Delta
finish = start + T

# --- Step 4: report ---
print(f"Last finisher at {finish.max()/3600:.2f} h (budget 5.5 h)")
print(f"Start window: {start.max()/60:.1f} min, mean wave size = {wave_size}")
for w in range(0, W, 5):
    members = finish[wave_id == w]
    print(f"  wave {w:2d}: median finish {np.median(members)/3600:.2f} h, "
          f"slowest {members.max()/3600:.2f} h")

Sensitivity & validation checklist

Sweep the inter-wave gap $\Delta$ from 90 s to 5 min; plot last-finisher time vs. max course density. There is a knee — pick $\Delta$ just past it.
Vary wave size from 50 to 200; check that swim-collision rate (proxied by mean swim pace variance within wave) stays bounded.
Re-fit pace distributions on a held-out year of results — your scheduler should still finish under 5.5 h.
Inject a 10% no-show rate and a 2% per-leg DNF rate; the schedule should be robust.
Compare against the actual wave plan from a real race (USAT publishes them) — your total event time should be within ~10 min.

Next: 2016 Problem B → All past problems