Python toolkit
Snippets that cover ~90% of what you'll need to code during HiMCM. Copy, adapt, cite the libraries in your References section. Code shown here uses Python 3.10+.
Recommended setup
Use a virtual environment so the project is reproducible. Pin to requirements.txt so teammates have the same versions.
# one-time setup
python3 -m venv .venv
source .venv/bin/activate # macOS / Linux
# .venv\Scripts\activate # Windows
pip install --upgrade pip
pip install numpy scipy pandas matplotlib seaborn \
statsmodels scikit-learn sympy \
networkx pulp openpyxl jupyterlab
Then launch Jupyter for interactive work:
jupyter lab
Why Jupyter. You can iterate quickly, keep plots inline, and copy clean figures into the paper. Once a notebook works, refactor the key parts into
.py modules for the appendix.
NumPy — array math
import numpy as np
# Vectors and matrices
x = np.linspace(0, 10, 200) # 200 points from 0 to 10
y = np.sin(x) + 0.1 * np.random.randn(200)
A = np.array([[2, 1], [1, 3]])
b = np.array([5, 8])
sol = np.linalg.solve(A, b) # solve Ax = b
# Stats
np.mean(y), np.std(y), np.percentile(y, [5, 50, 95])
# Random sampling
samples = np.random.normal(loc=0, scale=1, size=10_000)
pandas — data loading and tabular analysis
import pandas as pd
# Load Excel/CSV
df = pd.read_excel("2022_HiMCM_Data.xlsx", sheet_name="CO2")
df.head() # preview
df.describe() # summary stats
# Filter, group, aggregate
df_recent = df[df["Year"] >= 2000]
by_decade = df.groupby(df["Year"] // 10 * 10)["PPM"].mean()
# Rolling stats (for noisy time series)
df["PPM_smooth"] = df["PPM"].rolling(window=5, center=True).mean()
# Save your output
df.to_csv("processed.csv", index=False)
matplotlib — paper-quality plots
The point of plots in your paper is to convey a finding, not to look fancy. Stick to clean defaults.
import matplotlib.pyplot as plt
plt.figure(figsize=(7, 4))
plt.plot(df["Year"], df["PPM"], marker="o", linewidth=1, label="Observed CO₂")
plt.plot(df["Year"], df["PPM_smooth"], linewidth=2, label="5-yr moving avg")
plt.xlabel("Year")
plt.ylabel("Atmospheric CO₂ (ppm)")
plt.title("Atmospheric CO₂ at Mauna Loa, 1959–2021")
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig("fig_co2_trend.png", dpi=200)
plt.show()
Tip. Set
dpi=200 when saving so figures stay sharp when embedded in the PDF. Use .png for plots, .pdf if your paper toolchain (LaTeX) supports it.
Regression with SciPy / statsmodels
Simple linear regression with scipy:
from scipy import stats
slope, intercept, r, p, se = stats.linregress(df["Year"], df["PPM"])
print(f"slope = {slope:.3f} ppm/yr, R² = {r**2:.3f}, p = {p:.2e}")
Polynomial fit with numpy:
coeffs = np.polyfit(df["Year"], df["PPM"], deg=2)
poly = np.poly1d(coeffs)
pred_2050 = poly(2050)
print(poly) # prints the equation
Multivariate / log/exp models with statsmodels (gives p-values and CIs):
import statsmodels.api as sm
X = sm.add_constant(df[["Year", "CO2_lag1"]])
model = sm.OLS(df["Temp"], X).fit()
print(model.summary()) # full table for the paper
Nonlinear curve fitting
from scipy.optimize import curve_fit
def logistic(t, K, r, t0):
return K / (1 + np.exp(-r * (t - t0)))
t = df["Year"].values
y = df["Pop"].values
popt, pcov = curve_fit(logistic, t, y, p0=[1e7, 0.1, 2000])
K, r, t0 = popt
perr = np.sqrt(np.diag(pcov)) # standard errors of K, r, t0
ODE solving — solve_ivp
from scipy.integrate import solve_ivp
def sir(t, y, beta, gamma):
S, I, R = y
dS = -beta * S * I
dI = beta * S * I - gamma * I
dR = gamma * I
return [dS, dI, dR]
y0 = [0.99, 0.01, 0.0] # initial S, I, R
sol = solve_ivp(sir, t_span=(0, 200), y0=y0,
args=(0.3, 0.1), dense_output=True,
method="LSODA", max_step=1.0)
t = np.linspace(0, 200, 400)
S, I, R = sol.sol(t)
plt.plot(t, S, label="Susceptible")
plt.plot(t, I, label="Infected")
plt.plot(t, R, label="Recovered")
plt.legend(); plt.xlabel("Day"); plt.ylabel("Fraction")
plt.show()
Linear programming & MIP
# Linear programming with scipy
from scipy.optimize import linprog
# minimize c · x subject to A_ub x <= b_ub, x >= 0
c = [-3, -2] # maximize 3x + 2y → negate
A = [[1, 1], [2, 1]]
b = [4, 5]
res = linprog(c, A_ub=A, b_ub=b, method="highs")
print(res.x, -res.fun)
# Integer programming with PuLP
import pulp
prob = pulp.LpProblem("assignment", pulp.LpMaximize)
x = pulp.LpVariable.dicts("x", [(i, j) for i in range(3) for j in range(3)],
cat="Binary")
# objective
prob += pulp.lpSum(value[i][j] * x[(i, j)] for i in range(3) for j in range(3))
# each row assigned exactly once
for i in range(3):
prob += pulp.lpSum(x[(i, j)] for j in range(3)) == 1
prob.solve(pulp.PULP_CBC_CMD(msg=0))
A clean TOPSIS implementation
Reusable — drop into any decision-ranking problem.
def topsis(decision_matrix, weights, benefit_mask):
"""
decision_matrix : (n_alts, n_criteria) array of raw scores
weights : (n_criteria,) array, sums to 1
benefit_mask : (n_criteria,) bool: True if "higher is better"
Returns (closeness, ranking).
"""
X = np.asarray(decision_matrix, dtype=float)
w = np.asarray(weights, dtype=float)
# 1. Vector-normalize columns
norm = np.sqrt((X**2).sum(axis=0))
R = X / norm
# 2. Weight
V = R * w
# 3. Ideal best / worst
best = np.where(benefit_mask, V.max(axis=0), V.min(axis=0))
worst = np.where(benefit_mask, V.min(axis=0), V.max(axis=0))
# 4. Distance
d_best = np.sqrt(((V - best )**2).sum(axis=1))
d_worst = np.sqrt(((V - worst)**2).sum(axis=1))
# 5. Closeness
closeness = d_worst / (d_best + d_worst)
ranking = np.argsort(-closeness) # descending
return closeness, ranking
# Example
scores = np.array([[8, 7, 9], [6, 9, 7], [9, 6, 8]])
w = np.array([0.4, 0.35, 0.25])
mask = np.array([True, True, True])
c, r = topsis(scores, w, mask)
print("Closeness:", c)
print("Ranking:", r)
Entropy Weight Method
def entropy_weights(decision_matrix):
X = np.asarray(decision_matrix, dtype=float)
# min-max normalize so all values are positive in [0, 1]
X = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0) + 1e-12)
# proportions
P = X / (X.sum(axis=0) + 1e-12)
K = X.shape[0]
# avoid log(0)
P_safe = np.where(P > 0, P, 1)
e = -1.0 / np.log(K) * (P_safe * np.log(P_safe)).sum(axis=0)
d = 1 - e
return d / d.sum()
w_entropy = entropy_weights(scores)
print("EWM weights:", w_entropy)
Monte Carlo simulation
rng = np.random.default_rng(seed=42)
N = 10_000
# Sample uncertain parameters
walk_speed = rng.normal(1.4, 0.15, N).clip(0.8, 2.0)
check_time = rng.gamma(shape=8.0, scale=1.0, size=N) # mean ≈ 8s
delay = rng.exponential(scale=0.5, size=N)
# Run model on each sample
total_time = compute_clearance_time(walk_speed, check_time, delay)
# Report
print(f"Mean: {total_time.mean():.1f} s")
print(f"Std: {total_time.std():.1f} s")
print(f"5%/95%: {np.percentile(total_time, [5, 95])}")
networkx — graphs and routes
import networkx as nx
G = nx.Graph()
G.add_edge("entrance", "hallway", weight=4)
G.add_edge("hallway", "room1", weight=3)
G.add_edge("hallway", "room2", weight=3)
G.add_edge("hallway", "exit", weight=5)
# Shortest path
path = nx.shortest_path(G, "entrance", "exit", weight="weight")
cost = nx.shortest_path_length(G, "entrance", "exit", weight="weight")
# All pairs
dist = dict(nx.all_pairs_dijkstra_path_length(G))
# Visualize
import matplotlib.pyplot as plt
pos = nx.spring_layout(G, seed=1)
nx.draw(G, pos, with_labels=True, node_color="lightblue", node_size=900)
labels = nx.get_edge_attributes(G, "weight")
nx.draw_networkx_edge_labels(G, pos, edge_labels=labels)
plt.show()
ARIMA with statsmodels
from statsmodels.tsa.arima.model import ARIMA
# Fit ARIMA(1,1,1)
model = ARIMA(df["PPM"], order=(1, 1, 1))
res = model.fit()
print(res.summary())
# Forecast 10 years ahead with 95% CI
fc = res.get_forecast(steps=10)
mean = fc.predicted_mean
ci = fc.conf_int(alpha=0.05)
print(mean.head())
scikit-learn — classification & regression
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
y_prob = clf.predict_proba(X_test)[:, 1]
print(classification_report(y_test, y_pred))
print("AUC:", roc_auc_score(y_test, y_prob))
print("Coefficients:", dict(zip(feature_names, clf.coef_[0])))
Suggested project layout
himcm_paper/
├── data/ # raw datasets (.xlsx, .csv)
├── notebooks/ # exploratory Jupyter notebooks
│ ├── 01-data-cleaning.ipynb
│ ├── 02-base-model.ipynb
│ └── 03-sensitivity.ipynb
├── src/ # cleaned reusable code
│ ├── topsis.py
│ ├── sir_model.py
│ └── plots.py
├── figures/ # exported .png plots for the paper
├── paper/ # LaTeX or Word source
│ └── main.tex
├── requirements.txt
└── README.md
Reproducibility check. Before submission, clone your repo to a clean folder, install from
requirements.txt, and re-run every notebook from top to bottom. If anything breaks, fix it now — judges occasionally ask for code, and you'll need it next year for refinements.
Common mistakes that cost time
- Different teammates using different package versions. Pin them.
- Hard-coding paths. Use
pathlib.Path(__file__).parentor a config file. - Forgetting random seeds. Set
np.random.default_rng(seed)for reproducibility. - Not saving intermediate results. Use
df.to_parquet()or pickle so you don't recompute every notebook restart. - Over-styling plots. matplotlib defaults are fine; spend the time on the model, not the rcParams.