← 2024 Problem A outline · All past problems
Worked sample paper · HiMCM 2024 Problem A
A complete, judge-style reference paper for To Play or Not to Play: Modeling Future Olympic Games. This is not an official COMAP solution — it is a learning artifact written so a student can see what every section of a HiMCM paper should actually contain. Read it after attempting the problem yourself.
[illustrative] — they are plausible
placeholders, not authoritative findings.
Summary Sheet
Problem restated. The International Olympic Committee (IOC) must choose which Sports, Disciplines, and Events (SDEs) to add, retain, or remove for the Brisbane 2032 Summer Olympics. The IOC has published six criteria — popularity, gender equity, sustainability, inclusivity, relevance/innovation, and safety/fair play — but does not publish how those criteria are weighted in practice. Our task is to (i) build a quantitative model that scores any candidate SDE against those criteria, (ii) verify the model against known recent decisions, (iii) recommend three SDEs to add or reintroduce for Brisbane 2032, and (iv) deliver a non-technical letter to the IOC.
Approach. We model SDE selection as a multi-criteria decision problem. For each candidate SDE s, we compute a vector of six normalized indicator scores using public proxies (Google Trends share, athlete gender ratio, equipment carbon estimate, national-federation count, youth-participation growth, injury rate per 1000 athlete-hours). Subjective importance weights come from a 3-judge Analytic Hierarchy Process (AHP) panel within the team; objective weights come from the Entropy Weight Method (EWM) over the historical data. We average the two into a Combined Weight Vector and rank SDEs via TOPSIS. A logistic regression trained on 1988–2024 inclusion outcomes provides an independent cross-check, and a Monte Carlo sensitivity sweep over the weight vector confirms which recommendations are robust.
Key findings.
- The model correctly classifies 7 of 8 verification cases: athletics, swimming, gymnastics, and basketball (long-standing) all rank in the top quartile; surfing (added 2020) and breaking (added 2024) rank in the top third; karate (added 2020, removed 2024) is correctly identified as marginal. Flag football (added 2028) scores high on popularity but low on inclusivity — consistent with the IOC's host-country justification.
- For Brisbane 2032 our combined-weight TOPSIS ranks three additions: (1) Cricket T20 — high inertia from 2028 inclusion, near-universal South-Asian and Commonwealth following, large host-region fan base; (2) Squash — long-pending IOC application, strong gender equity, low venue carbon cost; (3) Lacrosse (Sixes) — growing youth participation, mixed-event format, modest equipment footprint.
- Sensitivity analysis: rankings 1–3 are stable under ±30% perturbation of the weight vector in 92% of 10,000
Monte Carlo trials
[illustrative]. Esports and kabaddi are the next two candidates but each flips out of the top 3 under reasonable weight changes. - The single highest-leverage parameter is the popularity weight. If popularity is weighted at less than 0.15, squash overtakes cricket; if greater than 0.30, esports enters the top 3.
Recommendations to the IOC.
- Adopt Cricket T20, Squash, and Lacrosse Sixes as the three additions for Brisbane 2032; retain breaking and flag football as host-driven extensions.
- Publish the explicit weighting used in selection. A transparent weight vector reduces lobbying asymmetries and makes the process auditable.
- For 2036 and beyond, evaluate esports and parkour on a new "digital-physical hybrid" criterion rather than forcing them into the current six.
- Re-score every SDE every 4 years using a refreshed data pipeline — popularity drifts faster than the Olympic cycle.
1. Introduction and Background
The modern Summer Olympic programme has expanded from 43 events in 1896 to 329 events across 32 sports for Paris 2024 (IOC, 2024). The 2014 Olympic Agenda 2020 explicitly moved sport selection from a fixed list to an "event-based" model in which the host city, together with the IOC, can propose additions subject to the six criteria stated in Rule 45 of the Olympic Charter (IOC, 2020).
This shift has produced rapid recent change: skateboarding, sport climbing, surfing, and karate were added for Tokyo 2020; breaking and a refreshed sport-climbing programme were confirmed for Paris 2024; flag football, cricket, lacrosse, baseball/softball, and squash were announced for Los Angeles 2028. Brisbane 2032 has not yet finalised its additions, which is the live problem this paper addresses.
The decision matters beyond sport. Each additional SDE costs the host city roughly USD 35–80M in venue, broadcast, and athlete-housing capacity (Flyvbjerg et al., 2021), and each SDE crowds the 10,500-athlete cap mandated by the IOC. Choosing well means weighting six partly-conflicting criteria; choosing badly means either a less relevant Games or an over-budget one.
2. Assumptions and Justifications
Every assumption below is used somewhere in Section 4 — we cite the equation where it enters.
- The six IOC criteria are exhaustive. We assume no seventh hidden criterion. Why: The IOC's published charter and Agenda 2020 list exactly these six; absent insider information we cannot invent a seventh. (Used in Eq. 1.)
- Each criterion can be approximated by 1–2 public, measurable proxies. Why: A multi-criteria model is only useful if it can be computed from open data within a 14-day window. We choose proxies with at-most-one-step interpretation (e.g., Google Trends share for popularity, not "cultural significance"). (Used throughout Section 4.)
- Proxies are comparable across SDEs after min-max normalization. Why: Different proxies have different units (search volume vs. athletes vs. kg-CO₂). Normalization to [0, 1] is the standard pre-processing step for TOPSIS (Hwang & Yoon, 1981). (Used in Eq. 2.)
- Subjective weights from a 3-judge AHP panel approximate IOC member preferences. Why: We cannot survey IOC members. A small in-team AHP panel with explicit pairwise comparisons, consistency-ratio checked, is the standard workaround in the operations-research literature. We acknowledge this is the largest single source of model error. (Used in Eq. 4.)
- The entropy method captures objective information content. Why: Criteria where SDEs differ greatly (high entropy across the dataset) carry more discriminating information than criteria where almost every SDE scores the same. EWM formalises this (Shannon, 1948; applied to MCDM by Zou et al., 2006). (Used in Eq. 5.)
- Inertia: an SDE included in the previous Games has a high baseline probability of staying. Why: Empirically, of the 28 core sports of Sydney 2000, 26 remained in Tokyo 2020 — a base-rate of about 93%. Our logistic regression encodes this with a "previously included" indicator. (Used in Eq. 7.)
- Host-country effect is small but real. Why: Tokyo added karate (Japanese martial-arts heritage); LA added flag football (American football proxy); Brisbane is likely to favour cricket and lacrosse (Commonwealth heritage, growing Australian leagues). We add a binary host-affinity feature. (Used in Eq. 7.)
- The 10,500-athlete cap is not binding for the addition of any single SDE. Why: The IOC has historically absorbed new SDEs by trimming events within existing sports rather than rejecting whole sports. We therefore do not model a hard cap. (Acknowledged limitation, Section 9.)
- Data quality is uniform across SDEs. Why: Google Trends and Wikipedia page-view APIs return comparable normalized series for every candidate. We acknowledge this is weaker for small federations. (Used in Eq. 2.)
- Criterion preferences are time-invariant over a single Olympic cycle (4 years). Why: AHP weights elicited today are assumed valid through 2032. We test sensitivity to this in Section 6. (Used in Eq. 4.)
3. Variables and Notation
| Symbol | Meaning | Units |
|---|---|---|
| s ∈ S | Index over candidate SDEs (|S| = 24 in our dataset) | — |
| c ∈ C | Index over IOC criteria, |C| = 6 | — |
| xs,c | Raw proxy value for SDE s on criterion c | varies |
| zs,c | Normalized value, zs,c ∈ [0, 1] | — |
| wcAHP | Subjective AHP weight for criterion c | — |
| wcEWM | Objective entropy-derived weight for c | — |
| wc | Combined weight (Eq. 6) | — |
| vs,c | Weighted-normalized matrix entry (Eq. 3) | — |
| A+, A− | Ideal and anti-ideal points in v-space | — |
| ds+, ds− | Euclidean distance from s to ideal / anti-ideal | — |
| Ts | TOPSIS closeness score, Ts ∈ [0, 1] | — |
| ps | Logistic-regression inclusion probability | — |
| Is | Inertia indicator (1 if in previous Games) | — |
| Hs | Host-affinity indicator | — |
| Rs | Final composite recommendation score (Eq. 9) | — |
4. Model Formulation
4.1 Decision matrix and normalization
Stack raw proxy values into a decision matrix X of shape |S| × |C|. Each criterion is one of two types: benefit (higher is better — popularity, inclusivity, gender equity, relevance) or cost (lower is better — equipment carbon, injury rate). Normalize per Eq. (2):
Equation (1) — decision matrix:
X = [ x_{s,c} ] s = 1..|S|, c = 1..|C|
Equation (2) — min-max normalization with criterion direction:
z_{s,c} = (x_{s,c} − min_s x_{s,c}) / (max_s x_{s,c} − min_s x_{s,c}) if c is benefit
z_{s,c} = (max_s x_{s,c} − x_{s,c}) / (max_s x_{s,c} − min_s x_{s,c}) if c is cost
4.2 AHP subjective weights
Each of three team-member judges builds a 6×6 pairwise-comparison matrix P using Saaty's 1–9 scale (Saaty, 1980). The weight vector is the principal eigenvector of P, normalised to sum to 1. We average across judges. We verify Saaty's consistency ratio CR < 0.10 for each judge before averaging; if not, the judge re-does the matrix.
Equation (3) — AHP weights from pairwise matrix:
P · wAHP = λmax · wAHP, CR = (λmax − n) / ((n − 1) · RIn) < 0.10
For our 3-judge panel the averaged AHP vector was approximately
wAHP = [0.28, 0.18, 0.10, 0.20, 0.14, 0.10] over
[popularity, gender equity, sustainability, inclusivity, relevance, safety] [illustrative].
4.3 Entropy weight method (EWM)
Treat each column c of the normalized matrix Z as a probability distribution:
Equation (4) — column probabilities:
p_{s,c} = z_{s,c} / Σs z_{s,c}
Equation (5) — Shannon entropy and entropy weight:
e_c = −(1 / ln |S|) · Σs p_{s,c} ln p_{s,c}
wcEWM = (1 − e_c) / Σc (1 − e_c)
High entropy ec means the criterion does not discriminate between SDEs and so receives a low weight. In our dataset the most discriminating criterion is sustainability; the least discriminating is safety (almost every Olympic-vetted sport scores well on safety).
4.4 Combined weight
Equation (6) — average of subjective and objective weights:
w_c = ½ · wcAHP + ½ · wcEWM, Σc w_c = 1
The 50/50 mix is a common default; we test sensitivity to this split in Section 6.
4.5 TOPSIS ranking
Form the weighted-normalized matrix and the ideal points:
Equation (7) — TOPSIS distances:
v_{s,c} = w_c · z_{s,c}
A+c = maxs v_{s,c}, A−c = mins v_{s,c}
ds+ = √ Σc (v_{s,c} − A+c)²
ds− = √ Σc (v_{s,c} − A−c)²
Ts = ds− / (ds+ + ds−)
Rank SDEs by Ts in descending order.
4.6 Logistic regression cross-check
Build a panel dataset: each row is (SDE, Games-year) for every Summer Olympics 1988–2024, with the six normalized criterion values plus inertia Is and host-affinity Hs. The label is 1 if the SDE was on the programme that year, 0 otherwise.
Equation (8) — logistic model:
p_s = σ ( β₀ + Σc βc z_{s,c} + βI Is + βH Hs )
σ(u) = 1 / (1 + e−u)
Fit by maximum likelihood with L2 regularisation (Pedregosa et al., 2011). Use 5-fold cross-validation; we
observed AUC ≈ 0.91 on held-out years [illustrative].
4.7 Final composite
To get one number per SDE we combine TOPSIS closeness and logistic probability:
Equation (9) — composite recommendation score:
R_s = α · T_s + (1 − α) · p_s, α = 0.6
α > 0.5 reflects the modelling team's prior that TOPSIS, which directly encodes the IOC's stated criteria, should dominate the data-driven inertia of past inclusions. We test sensitivity to α in Section 6.
5. Solution and Computational Approach
The full pipeline fits in a single Python module of about 250 lines. The sketch below contains the load
and the two core scoring routines — the parts a HiMCM team can realistically write and debug inside a 14-day
contest window. It runs end-to-end on the COMAP-supplied HiMCM_Olympic_Data.xlsx plus three small
augmentation CSVs (Google Trends, federation counts, injury rates).
"""himcm_2024a.py — Combined Weight + TOPSIS + Logistic for HiMCM 2024 Problem A."""
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegressionCV
BENEFIT = {"popularity", "gender_equity", "inclusivity", "relevance"}
COST = {"carbon", "injury"}
CRIT = ["popularity", "gender_equity", "carbon",
"inclusivity", "relevance", "injury"]
def normalize(df):
Z = df[CRIT].copy()
for c in CRIT:
lo, hi = Z[c].min(), Z[c].max()
if c in BENEFIT:
Z[c] = (Z[c] - lo) / (hi - lo)
else:
Z[c] = (hi - Z[c]) / (hi - lo)
return Z
def ahp_weights(P):
eigvals, eigvecs = np.linalg.eig(P)
idx = np.argmax(eigvals.real)
w = np.abs(eigvecs[:, idx].real)
w /= w.sum()
n = P.shape[0]
lam = eigvals[idx].real
RI = {1:0, 2:0, 3:0.58, 4:0.9, 5:1.12, 6:1.24, 7:1.32}[n]
CR = (lam - n) / ((n - 1) * RI) if RI else 0
assert CR < 0.10, f"AHP inconsistent, CR={CR:.3f}"
return w
def ewm_weights(Z):
P = Z / Z.sum(axis=0)
P = P.replace(0, 1e-12)
e = -(P * np.log(P)).sum(axis=0) / np.log(len(Z))
w = (1 - e) / (1 - e).sum()
return w.values
def topsis(Z, w):
V = Z.values * w
Aplus, Aminus = V.max(axis=0), V.min(axis=0)
dp = np.linalg.norm(V - Aplus, axis=1)
dm = np.linalg.norm(V - Aminus, axis=1)
return dm / (dp + dm)
def composite(df, P_ahp, alpha=0.6):
Z = normalize(df)
wA = ahp_weights(P_ahp)
wE = ewm_weights(Z)
w = 0.5 * wA + 0.5 * wE
T = topsis(Z, w)
# Logistic on historical panel (loaded separately)
panel = pd.read_csv("panel_1988_2024.csv")
Xtr = panel[CRIT + ["inertia", "host"]].values
ytr = panel["included"].values
lr = LogisticRegressionCV(cv=5, max_iter=1000).fit(Xtr, ytr)
Xpr = np.c_[Z.values, df["inertia"], df["host"]]
p = lr.predict_proba(Xpr)[:, 1]
R = alpha * T + (1 - alpha) * p
out = df[["sde"]].copy()
out["T"], out["p"], out["R"] = T, p, R
return out.sort_values("R", ascending=False), w
if __name__ == "__main__":
df = pd.read_excel("HiMCM_Olympic_Data.xlsx", sheet_name="candidates")
P_ahp = np.loadtxt("ahp_panel_averaged.csv", delimiter=",")
rank, w = composite(df, P_ahp)
print(rank.head(10).to_string(index=False))
print("\nCombined weights:", dict(zip(CRIT, w.round(3))))
The matching Monte Carlo loop for Section 6 sensitivity is another ~25 lines: resample wAHP
from a Dirichlet centered on the panel average, re-run composite, record rank of each top-3 SDE.
6. Results
6.1 Verification cases
| SDE | Recent decision | Rs | Rank (of 24) | Model verdict |
|---|---|---|---|---|
| Athletics | Continuously in since 1988 | 0.86 | 1 | ✓ Top quartile |
| Swimming | Continuously in since 1988 | 0.83 | 2 | ✓ Top quartile |
| Gymnastics | Continuously in since 1988 | 0.77 | 4 | ✓ Top quartile |
| Basketball | Continuously in since 1988 | 0.74 | 6 | ✓ Top quartile |
| Surfing | Added Tokyo 2020 | 0.66 | 9 | ✓ Top third |
| Breaking | Added Paris 2024 | 0.62 | 11 | ✓ Top half |
| Karate | Added 2020, removed 2024 | 0.51 | 16 | ✓ Marginal — consistent |
| Flag football | Added LA 2028 | 0.58 | 13 | ~ High popularity, low inclusivity |
All values [illustrative]. The model agrees with 7 of 8 known decisions; flag football is the
known limitation (driven by host-country effect more than by the six IOC criteria).
[Figure 1: Bar chart of Rs for all 24 candidate SDEs, sorted descending, with the eight verification SDEs highlighted in colour. The three Brisbane-2032 recommendations sit at ranks 3, 5, and 7.]
6.2 Brisbane 2032 ranking
| Rank | SDE | Rs | Ts | ps | Driving criteria |
|---|---|---|---|---|---|
| 1 | Cricket T20 | 0.79 | 0.81 | 0.76 | Popularity + inertia + host affinity |
| 2 | Squash | 0.72 | 0.75 | 0.67 | Gender equity + inclusivity + low venue cost |
| 3 | Lacrosse Sixes | 0.68 | 0.70 | 0.65 | Relevance (youth growth) + inertia (LA 2028) |
| 4 | Esports | 0.63 | 0.71 | 0.51 | Popularity, but very low inertia / safety ambiguous |
| 5 | Kabaddi | 0.59 | 0.64 | 0.51 | Inclusivity, but low popularity outside South Asia |
| 6 | Parkour | 0.55 | 0.62 | 0.45 | Relevance, but high injury rate |
| 7 | Ultimate Frisbee | 0.52 | 0.58 | 0.43 | Gender equity, but small federation count |
All values [illustrative].
[Figure 2: TOPSIS closeness Ts vs. logistic probability ps, with points labelled. The three recommendations cluster in the upper-right quadrant; esports sits high-T but low-p, illustrating the inertia gap.]
7. Sensitivity Analysis
We vary three parameters and report how the top-3 set changes.
7.1 Weight perturbation (Monte Carlo)
Sample wAHP from a Dirichlet distribution centred on the panel average with concentration α = 50 (≈ ±10% per-component noise). Re-run the pipeline 10,000 times. Record whether each candidate stays in the top 3.
| SDE | Times in top 3 (of 10,000) | Rank-stability |
|---|---|---|
| Cricket T20 | 9,820 | Very stable |
| Squash | 9,170 | Stable |
| Lacrosse Sixes | 7,540 | Stable |
| Esports | 2,210 | Marginal |
| Kabaddi | 980 | Unstable |
All values [illustrative]. Cricket and squash are robust; lacrosse is sensitive to the
relevance weight; esports replaces lacrosse in the top 3 about 22% of the time.
7.2 Popularity-weight sweep
| wpopularity | Top 3 |
|---|---|
| 0.10 | Squash, Lacrosse Sixes, Cricket T20 |
| 0.20 | Cricket T20, Squash, Lacrosse Sixes |
| 0.30 | Cricket T20, Esports, Squash |
| 0.40 | Esports, Cricket T20, Flag football (host) |
The popularity weight is the single highest-leverage parameter. Our panel-average weight (0.28) sits in the stable regime where cricket, squash, and lacrosse remain the top 3.
7.3 α (TOPSIS vs. logistic mix)
At α = 0.4 (data-driven inertia dominates), breaking and surfing re-enter the top 5 because they have very high ps. At α = 0.8 (criteria-driven), squash overtakes cricket because cricket's TOPSIS score is dragged down by venue carbon cost. The middle range α ∈ [0.5, 0.7] is stable.
8. Strengths and Weaknesses
Strengths
- Faithful to the IOC's stated criteria. Every model input maps to one of the six published criteria; no smuggled-in factors.
- Combined subjective + objective weights. Avoids the two common failure modes — pure AHP is too dependent on the panel, pure EWM is data-dominated and ignores stated priorities.
- Two independent rankers. TOPSIS and logistic regression are methodologically distinct; their agreement on cricket, squash, and lacrosse is real evidence rather than tautology.
- Reproducible. The Python pipeline runs end-to-end on the COMAP-supplied data plus three open-data CSVs.
- Explicit sensitivity. We report when our top 3 flips, not just that it does not.
Weaknesses
- AHP panel is the team, not the IOC. A real IOC member survey could shift the popularity weight substantially.
- Proxy choice is debatable. Google Trends over-weights internet-connected populations. Federation-count over-weights the bureaucratic dimension of inclusivity.
- Host effect is binary. A continuous host-affinity score (e.g., league size in host country) would be better.
- No interactions between criteria. A sport might be high-popularity because it is gender-balanced; we treat the two as independent.
- Esports and parkour stress the framework. The six IOC criteria were designed before hybrid digital-physical sport existed; our model inherits that gap.
9. Future Improvements
- Replace the team AHP panel with a small structured Delphi survey of 10–20 sports-management academics — feasible inside a 14-day window via a Google Form.
- Add a seventh "digital-physical hybrid" criterion to fairly evaluate esports, drone racing, and AR/VR-augmented sports.
- Replace logistic regression with a gradient-boosted classifier and feature-importance plots — easier to defend to non-specialist judges than logistic coefficients.
- Build a stakeholder-weight slider in the appendix (Streamlit or static HTML) so the IOC reader can re-rank in real time. This is now standard in Outstanding papers.
- Quantify the host effect with a hierarchical Bayesian model rather than a fixed binary indicator.
10. Letter to the IOC
To: Programme Commission, International Olympic Committee
From: HiMCM Team #XXXX
Subject: Three recommended sport additions for Brisbane 2032
Distinguished members,
The choice of which sports to add to an Olympic Games is at heart a decision about how to balance six values that sometimes pull against one another: popularity, gender equity, sustainability, inclusivity, relevance, and safety. We built a scoring model that treats those six values as the only criteria — no others — and used public data to score 24 candidate sports against them.
For Brisbane 2032 our model recommends three additions: Cricket T20, Squash, and Lacrosse Sixes. Cricket has the largest popularity score of any non-Olympic sport, a natural host-country fit, and the inertia of inclusion in LA 2028. Squash carries the cleanest profile on sustainability and gender equity; it has been on the IOC waiting list for two decades and finally has the evidence base to be admitted. Lacrosse Sixes is the youngest of the three by participation profile, which helps Brisbane reach a generation that has not historically been Olympic-engaged.
Two further candidates — esports and kabaddi — are close to our top three but are sensitive to small changes in how the six criteria are weighted. We recommend that the IOC publish its weighting explicitly. Doing so would let federations target the criteria that actually matter and would make decisions auditable rather than reputational.
We acknowledge the limits of this work. We could not survey IOC members directly, so we used a small in-team weighting panel. Our proxies — search volume, federation counts, injury rates — are imperfect stand-ins for the values they are meant to measure. We have tried to be honest about where the model is robust (cricket and squash) and where it is not (esports). We hope the framework is useful as a starting point for the Programme Commission's own deliberations.
Respectfully,
HiMCM Team #XXXX
11. References
- International Olympic Committee (2020). Olympic Agenda 2020+5: 15 recommendations. Lausanne: IOC.
- International Olympic Committee (2024). Paris 2024 sports programme. olympics.com/en/paris-2024/sports.
- COMAP (2024). HiMCM 2024 Problem A: To Play or Not to Play. contest.comap.com.
- Saaty, T. L. (1980). The Analytic Hierarchy Process. McGraw-Hill.
- Hwang, C.-L., & Yoon, K. (1981). Multiple Attribute Decision Making: Methods and Applications. Springer.
- Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379–423.
- Zou, Z., Yun, Y., & Sun, J. (2006). Entropy method for determination of weight of evaluating indicators in fuzzy synthetic evaluation. Journal of Environmental Sciences, 18(5), 1020–1023.
- Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in Python. JMLR, 12, 2825–2830.
- Flyvbjerg, B., Budzier, A., & Lunn, D. (2021). Regression to the tail: Why the Olympics blow up. Environment and Planning A, 53(2), 233–260.
- Olympedia (2024). Historical Olympic results database. olympedia.org.
- World Athletics (2024). Statistics and records. worldathletics.org.
- Wikipedia. Olympic sports — current and former. en.wikipedia.org/wiki/Olympic_sports.
12. Report on Use of AI (Appendix, does not count toward 25 pages)
Per COMAP rules in effect for the 2024 contest cycle, all generative-AI use must be disclosed.
| # | Tool | Where used | Prompt summary | How the team verified output |
|---|---|---|---|---|
| 1 | ChatGPT (GPT-4o, web) | Section 4, AHP refresher | "Explain Saaty's consistency-ratio formula with worked 4×4 example." | Cross-checked against Saaty (1980, ch. 3). Re-derived RI table by hand. |
| 2 | ChatGPT (GPT-4o, web) | Section 5, code skeleton | "Skeleton for TOPSIS in NumPy with min-max normalization." | Read line-by-line; replaced library defaults with our own normalization to match Eq. (2). |
| 3 | GitHub Copilot | Section 5, plotting | Autocomplete on matplotlib bar/scatter calls. | Run, inspect figure visually, compare to manual sanity ranking. |
| 4 | None | Sections 1–3, 8–10 | — | Written manually by team members; AI not consulted. |
The full prompt/response logs are included in appendix_AI_logs.pdf (separate file submitted
alongside this paper, also outside the 25-page limit).
[illustrative] with your own computation before submitting anything.