Modeling techniques catalogue
A working set of the modeling tools that show up again and again in HiMCM. Each one has a "when to use", a sketch of the math, a worked example, and a note on pitfalls. Combine 2–3 of these in most problems.
How to choose a method
Read the problem twice and ask three questions:
- What's the output? A number, a ranking, a plan, a forecast, a probability, a yes/no?
- What's the structure of the input? One scalar, several criteria, a time series, a network, a population?
- What's the relationship? Linear, exponential, S-shaped, periodic, stochastic, equilibrium-driven?
| If you need to… | Look at… |
|---|---|
| Rank options against multiple criteria | AHP, TOPSIS, Entropy Weight Method, weighted sum |
| Forecast a quantity over time | Regression, ARIMA, exponential smoothing, logistic growth |
| Model a population or compartments | ODEs (SIR, logistic, predator–prey), difference equations |
| Predict a yes/no outcome | Logistic regression, classification trees |
| Allocate scarce resources optimally | Linear programming, integer programming, greedy algorithms |
| Simulate behavior with randomness | Monte Carlo simulation, agent-based, Markov chain |
| Find a route or schedule | Graph algorithms (Dijkstra, MST, TSP heuristics), networkx |
| Quantify uncertainty | Bootstrap, Monte Carlo, confidence intervals |
1 · Analytic Hierarchy Process (AHP)
When to use. You have multiple criteria, each with subjective weights, and you need to rank alternatives. Classic for "pick the best host city" or "rank these candidate sports".
The mechanics.
- Build a pairwise comparison matrix $A$ of your criteria: $a_{ij}$ is how much more important criterion $i$ is than criterion $j$, on a 1–9 scale.
- Compute the principal eigenvector of $A$. Its entries (normalized) are your criterion weights $w$.
- Score each alternative on each criterion, multiply by $w$, sum.
- Check consistency: $CR = (\lambda_{\max} - n)/((n-1) \cdot RI)$. CR should be ≤ 0.10.
$\text{score}(k) = \sum_i w_i \cdot r_{ki}$, where $r_{ki}$ is alternative $k$'s rating on criterion $i$.
Pitfalls. The pairwise weights are subjective — be explicit about who set them and why. Always run a sensitivity check by perturbing the weights and seeing if the ranking changes. Many top papers combine AHP with the Entropy Weight Method to get a hybrid weight that has both subjective and data-driven components.
Sketch: AHP for choosing a Super Bowl host city
Criteria: renewable energy mix, transit access, hotel capacity, climate severity, existing venues. Build a 5×5 pairwise matrix (use scale 1=equal, 3=moderately more important, 5=strongly, 7=very strongly, 9=extremely). Compute weights. Score each candidate city 1–10 on each criterion (use real data). Multiply, sum, rank.
2 · TOPSIS
When to use. Same problem class as AHP — multi-criteria ranking — but you already have numeric scores for each alternative on each criterion and want a principled way to combine them.
The mechanics.
- Build the decision matrix: rows = alternatives, columns = criteria, values = scores.
- Normalize each column (vector normalization is standard: $\tilde r_{ki} = r_{ki} / \sqrt{\sum_k r_{ki}^2}$).
- Multiply each column by its weight $w_i$.
- Identify the "ideal best" $A^+$ (max in each column for "more is better" criteria, min for "less is better") and "ideal worst" $A^-$.
- For each alternative compute Euclidean distance to $A^+$ and $A^-$.
- Closeness coefficient $C_k = d_k^- / (d_k^+ + d_k^-)$. Rank by $C_k$, higher is better.
Pitfalls. TOPSIS depends entirely on the weights. Use AHP or Entropy Weight to derive them. Don't pretend uniform weights are "neutral" — they're a choice. Numbers must be comparable post-normalization; mind units.
3 · Entropy Weight Method (EWM)
When to use. You want data-driven weights for your criteria, with no subjective input. Often paired with TOPSIS or AHP.
The mechanics. Criteria with more variation across alternatives are more informative, so they get more weight.
- Normalize the decision matrix into proportions $p_{ki}$ (each column sums to 1).
- For each criterion $i$, compute its entropy: $e_i = -\frac{1}{\ln K} \sum_k p_{ki} \ln p_{ki}$ (where $K$ is the number of alternatives).
- The criterion's weight is proportional to $1 - e_i$, then normalized.
Pitfalls. EWM weights are sensitive to the choice of alternatives. Adding or removing an alternative shifts all weights — be honest about this. A common best practice is the Combined Weight Model: average AHP weights and EWM weights to get something with both expert judgment and data sensitivity.
4 · Regression (linear, polynomial, multivariate)
When to use. You have a numeric target and one or more numeric inputs, and you want to fit a relationship.
- Linear: $y = a + bx$. Good baseline. Use when the scatter plot looks like a straight line.
- Polynomial: $y = a_0 + a_1 x + a_2 x^2 + \dots$. Beware overfitting — degree 2 or 3 is usually enough.
- Multivariate: $y = a_0 + \sum_i a_i x_i$. For multiple predictors.
- Power / exponential / log: Linearize first (take log of both sides) and fit linearly.
What to report. The fit equation, $R^2$, residual plot, and a comment on whether residuals look random or have structure.
Worked sketch: 2022-B CO₂ trend
Given annual CO₂ from 1959–2021, fit:
- Linear: $C(t) = a + b(t-1959)$. Likely underestimates recent years.
- Quadratic: $C(t) = a + b(t-1959) + c(t-1959)^2$. Captures the acceleration.
- Exponential: $\ln(C - 280) = a + b(t-1959)$. Models "excess over pre-industrial".
Report all three with $R^2$ and prediction for 2050 + 2100. Pick the one with both good $R^2$ and physically reasonable extrapolation. Quadratic typically wins here.
5 · Logistic growth
When to use. Quantity grows fast at first, then saturates at a carrying capacity. Populations, market penetration, adoption curves.
Parameters to estimate: $r$ (intrinsic growth rate), $K$ (carrying capacity), $P_0$ (initial population). Fit using least squares to historical data, or estimate from physical reasoning.
Pitfalls. Don't apply logistic to a quantity that's still in pure exponential phase — you need data near saturation to estimate $K$.
6 · Compartmental ODE models
When to use. A system has distinct states and flow between them (susceptible/infected/recovered, larvae/pupae/adults, queue/server). The dynamics are continuous and aggregate.
Template (3-compartment example):
Solve numerically with SciPy's solve_ivp. Plot trajectories. Identify equilibria. Compute basic reproduction number $R_0 = \beta / \gamma$.
Pitfalls. Choose units carefully. Verify mass conservation ($S + I + R = N$ should hold). Stiff systems need stiff solvers (LSODA, BDF).
Sketch: honeybee colony (2022-A)
Compartments: eggs $E$, larvae $L$, workers $W$, queens $Q$. Daily flows: eggs are laid at rate $\beta Q$; eggs mature into larvae at rate $\mu_E$; larvae become workers at rate $\mu_L$; workers die at rate $\mu_W$. Add seasonal multipliers on $\beta$ and $\mu_W$ to capture winter slowdown. Solve as a system of ODEs.
7 · Logistic regression (classification)
When to use. You're predicting a binary outcome (will this SDE be included? will the team be evacuated in time? will the species become invasive?). Inputs can be a mix of numeric and categorical.
Fit by maximum likelihood (use statsmodels.Logit or sklearn.linear_model.LogisticRegression). Threshold at 0.5 to convert to a binary prediction, or interpret the probability directly.
Pitfalls. Always check for class imbalance. Report AUC or precision/recall, not just accuracy. Coefficients are interpretable as log-odds changes, which makes for good paper writing.
8 · ARIMA / time-series forecasting
When to use. You have a time series with trend and/or seasonality, and need to forecast forward. The judges flag ARIMA as appropriate, but warn that you must explain what it is, not just call auto_arima.
ARIMA(p, d, q) means: $p$ autoregressive lags, $d$ differencing steps to make the series stationary, $q$ moving-average terms.
Workflow.
- Plot the series. Look for trend, seasonality, level shifts.
- Difference until stationary (test with ADF). $d$ is usually 0, 1, or 2.
- Inspect ACF/PACF plots to pick $p, q$.
- Fit, check residuals (should be white noise), report AIC.
- Forecast with prediction intervals.
Alternative for HiMCM: exponential smoothing (Holt-Winters) is simpler to explain and often just as good for short horizons.
9 · Markov chains
When to use. Discrete states, transition between them depends only on the current state. Great for modeling progression, dynamics, or steady-state behavior.
The mechanics. Build transition matrix $P$ where $P_{ij}$ is the probability of going from state $i$ to state $j$. Starting distribution $\pi_0$. After $n$ steps: $\pi_n = \pi_0 P^n$. Steady-state distribution: eigenvector of $P$ with eigenvalue 1.
Use case: HPC energy mix evolution (2024-B)
States: coal-dominant, gas-dominant, nuclear-dominant, renewable-dominant. Each year there's a small probability of transitioning between states based on policy / investment. Iterate the matrix to see how the mix evolves over a decade. Steady-state tells you the long-run equilibrium under given transition probabilities.
10 · Monte Carlo simulation
When to use. Your model has multiple uncertain inputs, and you want a distribution of outputs (not just a point estimate). Also good for sensitivity analysis and for problems with complex randomness (queues, evacuations, disease spread).
Recipe.
- Pick distributions for each uncertain input (e.g., walking speed ∼ Normal(1.4, 0.2)).
- Draw $N$ samples (e.g., $N = 10\,000$).
- Run the model on each sample.
- Report the distribution: mean, std, 5th/95th percentile, histogram.
Convergence check. Plot output mean vs. $N$. It should stabilize. If not, increase $N$.
11 · Optimization (LP, IP, heuristics)
When to use. "Find the best…", "Minimize total…", "Maximize coverage…".
- Linear programming (LP): linear objective, linear constraints, continuous variables. Use
scipy.optimize.linprog. - Integer / Mixed-integer (IP/MIP): same but with integer constraints. Use
pulporcvxpywith a free solver (CBC). - Nonlinear: use
scipy.optimize.minimizewith methods like SLSQP or COBYLA. - Heuristic search: for hard combinatorial problems (TSP, scheduling), greedy + local search or simulated annealing.
12 · Graphs & networks
When to use. Routing (evacuation, delivery), connectivity (power grid, transit), influence (social networks).
Useful algorithms (all in networkx):
- Shortest path: Dijkstra (positive weights), Bellman-Ford (negative weights).
- Connectivity / components:
connected_components. - Centrality: degree, betweenness, eigenvector — for finding key nodes.
- Minimum spanning tree: Kruskal, Prim — for building cheapest connected network.
- Max flow / min cut: for capacity problems.
- TSP approximations: nearest-neighbor heuristic, 2-opt.
Use case: evacuation sweep routing (2025-A)
Model the building as a graph: rooms and hallway junctions are nodes, doors are edges, edge weight = traversal time. Each responder starts at a known node. Goal: visit all "room" nodes in minimum total time across all responders. This is a vehicle-routing / multi-agent shortest-path variant — use a greedy assignment plus local search, or formulate as a MIP if the graph is small.
13 · Agent-based simulation
When to use. Individuals (people, bees, buses) interact according to local rules, and you want emergent behavior. Especially good when an ODE doesn't capture spatial or behavioral heterogeneity.
Sketch. Define agent state (position, status). Define a step function that updates all agents per time tick. Run the simulation. Track aggregate quantities over time. Visualize.
Mesa (Python) is a clean ABM framework. For HiMCM-scale problems, writing a custom loop in NumPy is often simpler and faster.
How top teams typically combine these
From the 2024 judges' commentary, here's the recipe Outstanding teams used on Problem A (Olympic SDE selection):
- Identify factors (criteria) — qualitative reasoning + literature.
- Get data for each factor across alternatives (some from datasets, some from scraping Google Trends / social media).
- Use AHP and Entropy Weight Method to compute two sets of weights; combine them.
- Score alternatives with TOPSIS using the combined weights.
- Apply Monte Carlo on the weights to get a distribution of rankings (MC-TOPSIS) — this is the sensitivity analysis.
- Pick best alternatives, write recommendations.
For Problem B (HPC environmental impact):
- Estimate current energy use using engineering reasoning + published reports.
- Build a base carbon model: emissions = energy × emission factor, weighted by energy mix.
- Extrapolate with logistic growth (recognizing limits) instead of pure exponential.
- Use Markov chain over energy-mix states to model long-term transition.
- Add a second model for a chosen secondary impact (water / e-waste).
- Sensitivity analysis on the assumed growth rate and mix transitions.