Python toolkit

Snippets that cover ~90% of what you'll need to code during HiMCM. Copy, adapt, cite the libraries in your References section. Code shown here uses Python 3.10+.

Recommended setup

Use a virtual environment so the project is reproducible. Pin to requirements.txt so teammates have the same versions.

# one-time setup
python3 -m venv .venv
source .venv/bin/activate     # macOS / Linux
# .venv\Scripts\activate      # Windows

pip install --upgrade pip
pip install numpy scipy pandas matplotlib seaborn \
            statsmodels scikit-learn sympy \
            networkx pulp openpyxl jupyterlab

Then launch Jupyter for interactive work:

jupyter lab

Why Jupyter. You can iterate quickly, keep plots inline, and copy clean figures into the paper. Once a notebook works, refactor the key parts into .py modules for the appendix.

NumPy — array math

import numpy as np

# Vectors and matrices
x = np.linspace(0, 10, 200)          # 200 points from 0 to 10
y = np.sin(x) + 0.1 * np.random.randn(200)

A = np.array([[2, 1], [1, 3]])
b = np.array([5, 8])
sol = np.linalg.solve(A, b)          # solve Ax = b

# Stats
np.mean(y), np.std(y), np.percentile(y, [5, 50, 95])

# Random sampling
samples = np.random.normal(loc=0, scale=1, size=10_000)

pandas — data loading and tabular analysis

import pandas as pd

# Load Excel/CSV
df = pd.read_excel("2022_HiMCM_Data.xlsx", sheet_name="CO2")
df.head()                            # preview
df.describe()                        # summary stats

# Filter, group, aggregate
df_recent = df[df["Year"] >= 2000]
by_decade = df.groupby(df["Year"] // 10 * 10)["PPM"].mean()

# Rolling stats (for noisy time series)
df["PPM_smooth"] = df["PPM"].rolling(window=5, center=True).mean()

# Save your output
df.to_csv("processed.csv", index=False)

matplotlib — paper-quality plots

The point of plots in your paper is to convey a finding, not to look fancy. Stick to clean defaults.

import matplotlib.pyplot as plt

plt.figure(figsize=(7, 4))
plt.plot(df["Year"], df["PPM"], marker="o", linewidth=1, label="Observed CO₂")
plt.plot(df["Year"], df["PPM_smooth"], linewidth=2, label="5-yr moving avg")
plt.xlabel("Year")
plt.ylabel("Atmospheric CO₂ (ppm)")
plt.title("Atmospheric CO₂ at Mauna Loa, 1959–2021")
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig("fig_co2_trend.png", dpi=200)
plt.show()

Tip. Set dpi=200 when saving so figures stay sharp when embedded in the PDF. Use .png for plots, .pdf if your paper toolchain (LaTeX) supports it.

Regression with SciPy / statsmodels

Simple linear regression with scipy:

from scipy import stats

slope, intercept, r, p, se = stats.linregress(df["Year"], df["PPM"])
print(f"slope = {slope:.3f} ppm/yr, R² = {r**2:.3f}, p = {p:.2e}")

Polynomial fit with numpy:

coeffs = np.polyfit(df["Year"], df["PPM"], deg=2)
poly   = np.poly1d(coeffs)
pred_2050 = poly(2050)
print(poly)                           # prints the equation

Multivariate / log/exp models with statsmodels (gives p-values and CIs):

import statsmodels.api as sm

X = sm.add_constant(df[["Year", "CO2_lag1"]])
model = sm.OLS(df["Temp"], X).fit()
print(model.summary())                # full table for the paper

Nonlinear curve fitting

from scipy.optimize import curve_fit

def logistic(t, K, r, t0):
    return K / (1 + np.exp(-r * (t - t0)))

t = df["Year"].values
y = df["Pop"].values
popt, pcov = curve_fit(logistic, t, y, p0=[1e7, 0.1, 2000])
K, r, t0 = popt
perr = np.sqrt(np.diag(pcov))         # standard errors of K, r, t0

ODE solving — solve_ivp

from scipy.integrate import solve_ivp

def sir(t, y, beta, gamma):
    S, I, R = y
    dS = -beta * S * I
    dI =  beta * S * I - gamma * I
    dR =  gamma * I
    return [dS, dI, dR]

y0 = [0.99, 0.01, 0.0]                # initial S, I, R
sol = solve_ivp(sir, t_span=(0, 200), y0=y0,
                args=(0.3, 0.1), dense_output=True,
                method="LSODA", max_step=1.0)

t = np.linspace(0, 200, 400)
S, I, R = sol.sol(t)

plt.plot(t, S, label="Susceptible")
plt.plot(t, I, label="Infected")
plt.plot(t, R, label="Recovered")
plt.legend(); plt.xlabel("Day"); plt.ylabel("Fraction")
plt.show()

Linear programming & MIP

# Linear programming with scipy
from scipy.optimize import linprog

# minimize c · x  subject to A_ub x <= b_ub, x >= 0
c = [-3, -2]                          # maximize 3x + 2y  → negate
A = [[1, 1], [2, 1]]
b = [4, 5]
res = linprog(c, A_ub=A, b_ub=b, method="highs")
print(res.x, -res.fun)

# Integer programming with PuLP
import pulp
prob = pulp.LpProblem("assignment", pulp.LpMaximize)
x = pulp.LpVariable.dicts("x", [(i, j) for i in range(3) for j in range(3)],
                          cat="Binary")
# objective
prob += pulp.lpSum(value[i][j] * x[(i, j)] for i in range(3) for j in range(3))
# each row assigned exactly once
for i in range(3):
    prob += pulp.lpSum(x[(i, j)] for j in range(3)) == 1
prob.solve(pulp.PULP_CBC_CMD(msg=0))

A clean TOPSIS implementation

Reusable — drop into any decision-ranking problem.

def topsis(decision_matrix, weights, benefit_mask):
    """
    decision_matrix : (n_alts, n_criteria) array of raw scores
    weights         : (n_criteria,) array, sums to 1
    benefit_mask    : (n_criteria,) bool: True if "higher is better"

    Returns (closeness, ranking).
    """
    X = np.asarray(decision_matrix, dtype=float)
    w = np.asarray(weights, dtype=float)

    # 1. Vector-normalize columns
    norm = np.sqrt((X**2).sum(axis=0))
    R = X / norm

    # 2. Weight
    V = R * w

    # 3. Ideal best / worst
    best  = np.where(benefit_mask, V.max(axis=0), V.min(axis=0))
    worst = np.where(benefit_mask, V.min(axis=0), V.max(axis=0))

    # 4. Distance
    d_best  = np.sqrt(((V - best )**2).sum(axis=1))
    d_worst = np.sqrt(((V - worst)**2).sum(axis=1))

    # 5. Closeness
    closeness = d_worst / (d_best + d_worst)
    ranking   = np.argsort(-closeness)        # descending
    return closeness, ranking

# Example
scores = np.array([[8, 7, 9], [6, 9, 7], [9, 6, 8]])
w      = np.array([0.4, 0.35, 0.25])
mask   = np.array([True, True, True])
c, r   = topsis(scores, w, mask)
print("Closeness:", c)
print("Ranking:",  r)

Entropy Weight Method

def entropy_weights(decision_matrix):
    X = np.asarray(decision_matrix, dtype=float)
    # min-max normalize so all values are positive in [0, 1]
    X = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0) + 1e-12)
    # proportions
    P = X / (X.sum(axis=0) + 1e-12)
    K = X.shape[0]
    # avoid log(0)
    P_safe = np.where(P > 0, P, 1)
    e = -1.0 / np.log(K) * (P_safe * np.log(P_safe)).sum(axis=0)
    d = 1 - e
    return d / d.sum()

w_entropy = entropy_weights(scores)
print("EWM weights:", w_entropy)

Monte Carlo simulation

rng = np.random.default_rng(seed=42)
N = 10_000

# Sample uncertain parameters
walk_speed = rng.normal(1.4, 0.15, N).clip(0.8, 2.0)
check_time = rng.gamma(shape=8.0, scale=1.0, size=N)   # mean ≈ 8s
delay      = rng.exponential(scale=0.5, size=N)

# Run model on each sample
total_time = compute_clearance_time(walk_speed, check_time, delay)

# Report
print(f"Mean:   {total_time.mean():.1f} s")
print(f"Std:    {total_time.std():.1f} s")
print(f"5%/95%: {np.percentile(total_time, [5, 95])}")

networkx — graphs and routes

import networkx as nx

G = nx.Graph()
G.add_edge("entrance", "hallway", weight=4)
G.add_edge("hallway", "room1", weight=3)
G.add_edge("hallway", "room2", weight=3)
G.add_edge("hallway", "exit", weight=5)

# Shortest path
path = nx.shortest_path(G, "entrance", "exit", weight="weight")
cost = nx.shortest_path_length(G, "entrance", "exit", weight="weight")

# All pairs
dist = dict(nx.all_pairs_dijkstra_path_length(G))

# Visualize
import matplotlib.pyplot as plt
pos = nx.spring_layout(G, seed=1)
nx.draw(G, pos, with_labels=True, node_color="lightblue", node_size=900)
labels = nx.get_edge_attributes(G, "weight")
nx.draw_networkx_edge_labels(G, pos, edge_labels=labels)
plt.show()

ARIMA with statsmodels

from statsmodels.tsa.arima.model import ARIMA

# Fit ARIMA(1,1,1)
model = ARIMA(df["PPM"], order=(1, 1, 1))
res = model.fit()
print(res.summary())

# Forecast 10 years ahead with 95% CI
fc = res.get_forecast(steps=10)
mean = fc.predicted_mean
ci   = fc.conf_int(alpha=0.05)
print(mean.head())

scikit-learn — classification & regression

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
                                                    random_state=42)
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
y_prob = clf.predict_proba(X_test)[:, 1]

print(classification_report(y_test, y_pred))
print("AUC:", roc_auc_score(y_test, y_prob))
print("Coefficients:", dict(zip(feature_names, clf.coef_[0])))

Suggested project layout

himcm_paper/
├── data/                # raw datasets (.xlsx, .csv)
├── notebooks/           # exploratory Jupyter notebooks
│   ├── 01-data-cleaning.ipynb
│   ├── 02-base-model.ipynb
│   └── 03-sensitivity.ipynb
├── src/                 # cleaned reusable code
│   ├── topsis.py
│   ├── sir_model.py
│   └── plots.py
├── figures/             # exported .png plots for the paper
├── paper/               # LaTeX or Word source
│   └── main.tex
├── requirements.txt
└── README.md

Reproducibility check. Before submission, clone your repo to a clean folder, install from requirements.txt, and re-run every notebook from top to bottom. If anything breaks, fix it now — judges occasionally ask for code, and you'll need it next year for refinements.

Common mistakes that cost time

Different teammates using different package versions. Pin them.
Hard-coding paths. Use pathlib.Path(__file__).parent or a config file.
Forgetting random seeds. Set np.random.default_rng(seed) for reproducibility.
Not saving intermediate results. Use df.to_parquet() or pickle so you don't recompute every notebook restart.
Over-styling plots. matplotlib defaults are fine; spend the time on the model, not the rcParams.

Next: Past problems → Or jump to mock problems