2016 · Problem B — Shop and Ship
Facility location Set cover Geospatial OptimizationThe prompt, restated
An online recreation-equipment retailer is expanding across the continental United States. The company wants every customer in the lower 48 to be reachable by one-day ground shipping from one of its warehouses. The team is asked to determine (1) the minimum number of warehouses required to meet that promise, and where to put them; (2) how that warehouse footprint interacts with state sales-tax nexus — historically, a business only had to collect sales tax in states where it had a physical presence, so adding a warehouse in a state with no sales tax (Oregon, Montana, New Hampshire, Delaware, Alaska) or a low rate is worth more than just real estate; and (3) how the design should change if the company expands its catalog from bulky outdoor gear to clothing and apparel, which has a different demand profile, smaller package size, and in some states (e.g., Pennsylvania, Minnesota) is exempt from sales tax. The team must write a recommendation to the company's CFO with a map, a warehouse count, and a projected annual tax-savings estimate.
This is the classic set-cover / p-median facility-location problem with two extra twists (the tax overlay and the apparel-vs-gear shift in demand), which lifts it above a textbook exercise.
Key modeling idea
The "one-day ground" promise translates to a hard geographic radius: UPS and FedEx ground-service maps show that one-day delivery covers approximately a 250–300 mile drive-time radius from a warehouse [illustrative — verify against the UPS ground-service map]. So the core problem is set cover: find the smallest set of warehouse locations $S$ such that every U.S. ZIP-code centroid (weighted by population) lies within that radius of at least one $s \in S$. Layer two objectives on top: (a) minimize total expected sales-tax paid by customers, treating warehouse states as nexus states; (b) minimize average shipping cost (mileage-weighted demand).
Suggested approach
- Step 1 — Build the demand layer. Pull U.S. Census 2010/2015 county or ZCTA population (≈3,100 counties or ≈33,000 ZCTAs). Each county is a demand point with weight $w_i$ = population × per-capita spend on recreation gear (BLS Consumer Expenditure Survey gives ~\$120 / person / year on "sporting goods" [illustrative]).
- Step 2 — Generate candidate sites. Use the ~300 largest U.S. metros as candidate warehouse locations (covers >80% of land area within their radii). Compute the pairwise drive-distance matrix using Haversine $\times$ 1.2 (road-detour factor) or, for a better answer, OSRM on OpenStreetMap.
- Step 3 — Solve the set cover. Greedy set cover gives a $\ln n$ approximation and is fast on this scale (technique 5). Then refine with an ILP (PuLP / CBC) for the final pick. A well-known result for the U.S. is that 5–7 warehouses cover the lower 48 within a 1-day radius [illustrative — Amazon historically used 6–8 fulfillment centers in this era].
- Step 4 — Overlay the tax map. Maintain a table of state combined sales-tax rates (Tax Foundation publishes annual). Total customer tax liability is $\sum_{i} w_i \cdot r_{\text{state}(i)} \cdot \mathbf{1}[\text{state}(i) \in \text{nexus}]$. Re-solve allowing the optimizer to pick the cheapest-tax site that still satisfies the coverage constraint. Sites in OR, NH, DE, MT typically dominate.
- Step 5 — Apparel re-design. Apparel has higher demand per capita ($\approx 2\times$ recreation gear by transaction count [illustrative]) but lighter packages, so the radius constraint loosens (truck stops are not the binding constraint), and several states exempt clothing from sales tax. Re-run with new weights and a sparser network — the answer often shrinks to 4–5 warehouses, with one in a clothing-exempt state.
Data sources to consider
| Source | What you get |
|---|---|
| U.S. Census Bureau (TIGER/Line, ZCTA, county pop) | Demand layer: ~3,100 counties or ~33,000 ZCTAs with centroids and population |
| UPS / FedEx ground-service maps | Empirical one-day radius from major hubs (250–300 mi) |
| Tax Foundation — State and Local Sales Tax Rates | Annual table of combined state+local sales-tax rates by state |
| BLS Consumer Expenditure Survey | Per-capita spend on sporting goods and on apparel |
| OpenStreetMap / OSRM | Real road-network drive times between candidate sites and demand points |
| CBRE / JLL industrial real-estate reports | Warehouse lease cost per sq-ft by metro (Inland Empire, Lehigh Valley, DFW, etc.) |
Common pitfalls
- Using straight-line distance. A 300-mile Haversine radius from Denver reaches into mountains where no road goes. Always add a road-detour factor or use OSRM.
- Treating ZIP codes as points. Wyoming has huge ZIPs — pick centroids weighted by population, not by area.
- Ignoring the Wayfair shift. South Dakota v. Wayfair (2018) changed the nexus rules; in 2016 physical presence still ruled, but a winning paper should flag the assumption explicitly.
- Counting tax savings as company revenue. The tax is paid by the customer; it lowers the all-in price the customer sees, which lifts demand elasticity-style. Model that lift instead of booking the tax as savings.
- Single-objective tunnel vision. Minimum count, minimum cost, and minimum tax give three different answers. Use a Pareto plot, not one number.
Python sketch
Greedy set cover on county centroids: pick warehouses until every county is within $R$ miles of at least one warehouse.
import numpy as np, pandas as pd
# --- Step 1: load demand layer (county centroid, population, state) ---
# columns: county, lat, lon, pop, state
df = pd.read_csv("us_counties.csv")
candidates = pd.read_csv("top300_metros.csv") # candidate warehouse sites
def haversine(lat1, lon1, lat2, lon2):
R = 3959.0 # miles
p1, p2 = np.radians(lat1), np.radians(lat2)
dp, dl = p2 - p1, np.radians(lon2 - lon1)
a = np.sin(dp/2)**2 + np.cos(p1)*np.cos(p2)*np.sin(dl/2)**2
return 2 * R * np.arcsin(np.sqrt(a))
# --- Step 2: coverage matrix (candidate x county) ---
RADIUS = 300 * 1.2 # straight-line proxy for 300-mi road radius
D = haversine(candidates.lat.values[:, None], candidates.lon.values[:, None],
df.lat.values[None, :], df.lon.values[None, :])
covers = D <= RADIUS # boolean matrix
# --- Step 3: greedy set cover, weighted by population ---
uncovered = df.pop.values.astype(float).copy()
picked, total_pop = [], uncovered.sum()
while uncovered.sum() > 0.001 * total_pop: # stop at 99.9% covered
gain = covers @ (uncovered > 0)
best = int(np.argmax(gain))
picked.append(best)
uncovered[covers[best]] = 0
# --- Step 4: tax overlay ---
tax = {"OR":0.0, "NH":0.0, "DE":0.0, "MT":0.0, "CA":0.0875, "TX":0.0825} # illustrative
chosen = candidates.iloc[picked]
print("Warehouses chosen:", chosen[['metro','state']].to_string(index=False))
print(f"Mean state tax of nexus footprint = "
f"{np.mean([tax.get(s, 0.06) for s in chosen.state]):.3%}")
Sensitivity & validation checklist
- Sweep $R$ from 200 to 400 miles; warehouse count should drop roughly as $1/R^2$. Validate against Amazon / REI public fulfillment-center counts for the era.
- Drop the candidate-site list to top-100 metros; the solution should be stable — if it changes drastically, your candidate set was too sparse.
- Replace Haversine with OSRM drive times for the final five sites; recompute coverage. Mountains and water gaps (Great Lakes) often force a swap.
- Stress test: what if the company adds Alaska + Hawaii? Both need air-freight or a separate hub — be explicit about scope.
- Pareto sweep across (warehouse count, customer tax burden, mean ship distance). Mark the recommended point and explain why.