2015 · Problem B — City Crime and Safety

Composite index Geospatial analysis Clustering Decision communication

The prompt, restated

You are given two weeks of police-report data for a fictional major city of about 2.8 million residents. Each record includes a crime type (assault, robbery, burglary, theft, vandalism, etc.), a location (district or coordinate), a timestamp, and whether an arrest was made. The prompt asks the team to develop a safety rating for the city — a single number, or a small set of numbers, that captures how safe the city is and that can be compared to ratings for other cities. Teams are then asked to identify which neighborhoods or districts drive the rating, recommend where police resources should be redirected, and discuss what additional data would sharpen the model.

Two deliverables are explicitly required: a formal technical analysis documenting the modeling decisions, and a non-technical report to the mayor that translates the rating into a recommendation a city council can act on without seeing the math. The catch: two weeks is a short window, the data is noisy, and "safety" is a multi-dimensional concept that no single statistic captures cleanly. The judges reward teams that acknowledge this honestly rather than papering over it with a slick-looking number.

Key modeling idea

Build a composite safety index with three pillars: incidence (crimes per 1000 residents), severity (a weighted sum where violent crime counts more than property crime), and clearance (fraction of incidents leading to arrest, as a proxy for police effectiveness). Aggregate the three with explicit weights using either expert-driven (AHP) or data-driven (Entropy Weight Method) weighting. Report at two granularities — citywide and per-district — and benchmark against published city safety rankings so the headline number is on a defensible scale.

Suggested approach

Step 1 — Clean and bucket. Normalize crime-type strings, drop or impute missing locations, and bucket crimes into FBI Uniform Crime Reporting Part I (violent: homicide, rape, robbery, aggravated assault; property: burglary, larceny, MVT, arson) vs. Part II. The Part I/II split is the standard the audience already knows.
Step 2 — Build the three pillars. Per district $d$: incidence $I_d = \text{crimes}_d / \text{pop}_d$; severity $S_d = \sum_c w_c \cdot \text{count}_{c,d}$ with $w_c$ from sentencing-severity tables; clearance $C_d = \text{arrests}_d / \text{crimes}_d$. Min-max normalize each, then combine: $\text{Safety}_d = 100 \cdot (1 - \alpha I_d - \beta S_d + \gamma C_d)$.
Step 3 — Choose weights. Run both AHP (analyst-chosen pairwise weights) and the Entropy Weight Method (weights from inter-district variance). Compare. If the two give similar district rankings (Spearman $\rho > 0.85$ [illustrative]), the result is robust; if not, explain why and pick one with justification.
Step 4 — Find hotspots. Run DBSCAN on the (lat, lon, severity-weighted) point cloud to identify clusters too small to register at district level. These become the patrol-redirection recommendation.
Step 5 — Write the mayor's letter. One page, no jargon. Lead with the citywide rating, the three pillars, the top three hotspot neighborhoods, and one concrete resource recommendation. Acknowledge the two-week window as a caveat.

Data sources to consider

Source	What you get
FBI Uniform Crime Reporting (UCR) / NIBRS	Standard crime taxonomy, Part I/II split, national clearance benchmarks
Chicago / NYC / LA open-data crime portals	Real labeled incident data with lat/lon; great for calibrating your model on a known city
U.S. Census ACS	Per-tract population for the incidence denominator; demographics for context
BJS Crime Victimization Survey	Reporting rates — important because police data is a fraction of true incidence
SafeWise / U.S. News city safety rankings	External benchmarks to anchor your index on a familiar 0–100 scale

Common pitfalls

Treating all crimes as equal. A 100-incident shoplifting week is not a 100-incident assault week. Without a severity weight, your index reflects retail theft.
Ignoring the denominator. Raw counts favor low-population districts. Always normalize per resident (or per 1000 residents) before ranking.
Two weeks treated as a year. Small samples mean wide confidence intervals on per-district rates. Report intervals, not just point estimates, and avoid ranking districts whose intervals overlap.
Reporting bias forgotten. Clearance is a function of both effort and reporting culture. A high-clearance district may simply have fewer total reports.
One number for the mayor only. The judges want the technical paper and the non-technical letter. Both must be present, and the letter must stand alone.

Python sketch

Build the three-pillar index and a DBSCAN hotspot pass.

import numpy as np
import pandas as pd
from sklearn.cluster import DBSCAN

df = pd.read_csv("city_crimes.csv")         # type, district, lat, lon, arrest_made
pop = pd.read_csv("district_pop.csv").set_index("district")["pop"]

severity_w = {"homicide":10, "rape":9, "robbery":7, "agg_assault":7,
              "burglary":4, "larceny":2, "mvt":3, "vandalism":1, "other":1}
df["w"] = df["type"].map(severity_w).fillna(1)

g = df.groupby("district").agg(
        crimes  = ("type", "size"),
        sev     = ("w", "sum"),
        arrests = ("arrest_made", "sum"))
g["incidence"] = g["crimes"] / pop * 1000
g["severity"]  = g["sev"]    / pop * 1000
g["clearance"] = g["arrests"]/ g["crimes"]

def mm(x): return (x - x.min()) / (x.max() - x.min() + 1e-9)
alpha, beta, gamma = 0.4, 0.4, 0.2          # tuned by AHP / EWM
g["safety"] = 100 * (1 - alpha*mm(g["incidence"])
                       - beta *mm(g["severity"])
                       + gamma*mm(g["clearance"]))

# hotspots: cluster severity-weighted points (~111 km per deg lat)
pts = df[["lat","lon"]].to_numpy()
w   = df["w"].to_numpy()
labels = DBSCAN(eps=0.003, min_samples=20).fit_predict(pts, sample_weight=w)
hotspots = (pd.DataFrame(pts, columns=["lat","lon"])
              .assign(cluster=labels, w=w)
              .query("cluster >= 0")
              .groupby("cluster")[["lat","lon","w"]].mean()
              .sort_values("w", ascending=False).head(5))
print(g.sort_values("safety"))
print(hotspots)

Sensitivity & validation checklist

Vary the pillar weights $(\alpha, \beta, \gamma)$ on a simplex grid and report how often the bottom-3 districts change. Robust if the same 3 appear in $>80\%$ of weight configurations.
Bootstrap the two-week dataset (resample with replacement) 1000 times and report a 95% CI for each district's safety score.
Swap severity weights for an alternative scheme (sentencing-table weights vs. victim-impact-survey weights) — top-rank stability is your validation.
Backtest: apply the same index to a real city (Chicago open data) and check whether the output matches published rankings within reasonable bounds.
Stress with under-reporting: if 20% of larcenies go unreported, does the citywide score shift more than 5 points? Quantify the reporting-rate sensitivity.

← Back: 2015 Problem A All past problems