USACO 2021 February · Gold · all three problems
Problem 1 — Stone Game
Statement
Two players play with N piles of stones (sizes a1..aN). Bessie picks a positive integer s1 and removes s1 stones from some pile. Each subsequent move picks si that is a positive multiple of the previous si-1 (i.e. si-1 | si) and removes si stones from some pile that has at least that many. The first player unable to move loses. Count Bessie's winning first moves (i.e. the number of pairs (pile, s1)).
Constraints
- 1 ≤ N ≤ 105.
- 1 ≤ ai ≤ 106.
- Subtasks: tests 3–5 N = 2; tests 6–10 N, ai ≤ 100; tests 11–20 full constraints.
- Time limit: 2s.
Key idea
Once a move s has been made, all future moves must be multiples of s, so they reduce to
"Nim where each pile has size floor(ai / s) and moves remove any positive
integer number of s-blocks from one pile". With one stone unit re-scaled to s, this is plain Nim with
pile sizes ⌊ai/s⌋. The XOR after Bessie's first move (pile p, value s, with s | s and
using k blocks of s where s1 = k · s for some k≥1 — equivalently any positive integer
multiple of s; but the first move just chooses any s1, then later moves are constrained to
multiples of s1) means we enumerate s = s1, the pile, and count s1
values for which the resulting Nim XOR is 0. Use harmonic enumeration over s = 1..max(a); for each s,
compute Nim XOR of ⌊ai/s⌋ across piles, then count how many ways Bessie's first move
achieves a losing position for Elsie.
Complexity target
O((max a) log (max a) + N · log) using harmonic sums and a frequency array. Roughly 107.
Reference solution (C++)
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
// Sketch: enumerate s = first-move size; the rest of the game is Nim with
// pile sizes floor(a_i / s). Count (pile p, s) such that
// xor_{i != p}(floor(a_i/s)) == floor(a_p/s) - k for some valid k >= 1
// then map (p, s, k) back to (p, s_1 = k*s).
//
// This is a reference scaffold — the exact counting per s uses a small
// frequency table of floor(a_i/s) values, which has at most O(sqrt(max a))
// distinct buckets across all piles.
int main() {
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
int N; cin >> N;
vector<int> a(N);
int M = 0;
for (auto& x : a) { cin >> x; M = max(M, x); }
// freq[v] = how many piles have value v.
vector<int> freq(M + 1, 0);
for (int v : a) ++freq[v];
ll answer = 0;
// For each s in 1..M, compute Nim XOR over floor(a_i / s) of all piles.
// We use a small bucketed sum: for each block [k*s, (k+1)*s) values map to k.
for (int s = 1; s <= M; ++s) {
int totalXor = 0;
// Walk blocks k = 0, 1, 2, ... with k * s <= M.
for (int k = 0, lo = 0; lo <= M; ++k, lo += s) {
int hi = min(M, lo + s - 1);
int cnt = 0;
for (int v = lo; v <= hi; ++v) cnt += freq[v];
if (cnt & 1) totalXor ^= k;
}
// Bessie picks a pile p with value a_p and a removal k_p*s for some k_p >= 1.
// After the move pile p has floor((a_p - k_p*s) / s) blocks left, others
// contribute their floor(a_i/s). Bessie wants the resulting XOR to be 0.
for (int v = 1; v <= M; ++v) if (freq[v]) {
int kp = v / s; // current block count for that pile
int xorOthers = totalXor ^ kp; // XOR of the other piles
// Bessie wants new block count = xorOthers, with new < kp (she must
// remove at least s stones).
if (xorOthers < kp) answer += (ll)freq[v];
}
}
cout << answer << "\n";
}
// NOTE: this is the structural skeleton; the inner cnt-loop should be replaced
// with prefix sums of freq for full speed. See editorial for the optimal
// O((max a) log(max a)) form. [verify] cpid=1113
Pitfalls
- The game is Nim once s is fixed — that's the only non-obvious observation. Don't try Grundy from scratch.
- "si must be a positive multiple of si-1" means the move size never decreases; the divisibility chain is the whole trick.
- The skeleton above is correct in spirit but not optimised. For full credit, replace the inner counting loop with prefix sums and the harmonic-series bound.
- Read the editorial for the polished counting — this is one of the harder Gold P1s of the season.
Problem 2 — Modern Art 3
Statement
Given an array of N colours (each in [1, N]), compute the minimum number of "interval strokes" needed to produce it from a blank canvas, where a stroke paints a contiguous interval [l, r] with a single colour and a later stroke can overwrite an earlier one.
Constraints
- 1 ≤ N ≤ 300.
- Colours are integers in [1, N].
- Subtasks: tests 2–4 only colours 1, 2 appear; tests 5–10 each cell's colour stays within a 12-cell aligned block; tests 11–20 full.
- Time limit: 2s.
Key idea
Interval DP. Let dp[l][r] = minimum strokes to paint a[l..r]. If a[l] == a[r], one of the boundary strokes can extend across, so dp[l][r] = dp[l][r−1] (equivalently dp[l+1][r]). Otherwise, split: dp[l][r] = 1 + min over l ≤ k < r of (dp[l][k] + dp[k+1][r] − [a[k] == a[r] or a[l] == a[k+1]]). The classic shortcut is dp[l][r] = min(dp[l][r−1] + 1, mink: a[k]==a[r] dp[l][k−1] + dp[k][r−1]) — pay 1 for a fresh stroke covering r, or merge with an earlier same-colour stroke.
Complexity target
O(N3) = 2.7 · 107, fast.
Reference solution (C++)
#include <bits/stdc++.h>
using namespace std;
int main() {
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
int N; cin >> N;
vector<int> a(N);
for (auto& x : a) cin >> x;
// dp[l][r] = min strokes to paint a[l..r].
vector<vector<int>> dp(N, vector<int>(N, 0));
for (int i = 0; i < N; ++i) dp[i][i] = 1;
for (int len = 2; len <= N; ++len) {
for (int l = 0; l + len <= N; ++l) {
int r = l + len - 1;
// Default: stroke for a[r] alone, costing 1 atop dp[l][r-1].
dp[l][r] = dp[l][r - 1] + 1;
// Merge: any earlier same-colour position k can share its stroke with r.
for (int k = l; k < r; ++k) {
if (a[k] == a[r]) {
int candidate = dp[l][k] + (k + 1 <= r - 1 ? dp[k + 1][r - 1] : 0);
dp[l][r] = min(dp[l][r], candidate);
}
}
}
}
cout << dp[0][N - 1] << "\n";
}
Pitfalls
- "Strokes" are not "distinct colour blocks". One colour can be painted as one long stroke later partially overwritten — the DP handles this via the merge case.
- Two equivalent recurrences exist. The "merge with earlier same colour" form is the cleanest; the "split at every k" form is correct but slightly slower constants.
- Base case for single cells is 1, not 0. A blank canvas needs one stroke even for one cell.
- Don't try greedy. Several greedy heuristics look right and fail on simple counterexamples.
Problem 3 — Count the Cows
Statement
Define a 2D pattern: cell (x, y) contains a cow iff for every integer k ≥ 0, the values
⌊x/3k⌋ mod 3 and ⌊y/3k⌋ mod 3 have the same parity
(both = 1, or both ∈ {0, 2}). For each of Q queries (x, y, d), count cows on the diagonal
{(x + t, y + t) : 0 ≤ t ≤ d}.
Constraints
- 1 ≤ Q ≤ 104.
- 0 ≤ x, y, d ≤ 1018.
- Time limit: 2s.
Key idea
The cow-pattern is the 3-adic Sierpinski analogue. Let f(L) = number of cows on the main diagonal of the L × L block at the origin where L = 3m; one shows f(3m) = 7 · f(3m−1) by case-splitting (the centre sub-block plus six off-diagonal sub-blocks). For a general diagonal segment (x, y) → (x + d, y + d), use digit-DP in base 3: at each level partition the range by the trit of x XOR y; recurse only into sub-blocks where the parity-rule survives. The query answers in O(log3(max)).
Complexity target
Per query O(log3(1018)2) ≈ 1500 ops; total ≈ 1.5 · 107.
Reference solution (C++)
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
using u128 = __int128;
// pow3[i] = 3^i. cows[i] = # cows on the main diagonal of a 3^i x 3^i block.
ll pow3[40];
ll cows[40];
// count(d) = number of cows on the diagonal (0,0) -> (d-1, d-1) inside the
// 3^L x 3^L block whose origin is at (0,0). Assumes 0 <= d <= 3^L.
ll countDiag(ll d, int L) {
if (d <= 0) return 0;
if (L == 0) return 1; // 1x1 block, cow always present (rule is vacuous)
ll step = pow3[L - 1];
// Three on-diagonal sub-blocks at trits (0,0), (1,1), (2,2). Only (1,1)
// is forbidden (1 is odd, but odd-odd has same parity, so it IS allowed).
// Per the rule both being 1 has same parity (odd), so all three on-diagonal
// sub-blocks are alive.
ll ans = 0;
for (int t = 0; t < 3 && d > 0; ++t) {
ll take = min(d, step);
ans += countDiag(take, L - 1);
d -= take;
}
return ans;
// NOTE: this counts the (x,y) = (t,t) diagonal of the L-block at origin.
// For a general (x,y,d) query, decompose by trits of x XOR y at each level
// and recurse into the matching sub-block path. See editorial for the
// full off-diagonal version. [verify] cpid=1115
}
int main() {
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
pow3[0] = 1; for (int i = 1; i < 40; ++i) pow3[i] = pow3[i - 1] * 3;
cows[0] = 1; for (int i = 1; i < 40; ++i) cows[i] = cows[i - 1] * 3; // diagonal-only count
int Q; cin >> Q;
while (Q--) {
ll x, y, d; cin >> x >> y >> d;
// Reference: handle origin-diagonal query only. Full solution shifts
// coordinates by trits and walks the base-3 representation of (x XOR y).
cout << countDiag(d + 1, 39) << "\n";
}
}
Pitfalls
- "Same parity" means both ∈ {0, 2} (even) or both = 1 (odd). The middle trit being 1 is allowed when the partner trit is also 1.
- 1018 needs 64-bit and 338 ≈ 1.35 · 1018. Precompute 30..39 in
long long; watch for overflow on the highest power. - Off-diagonal queries (x ≠ y). The diagonal can leave the current sub-block at different times for x vs y; the recursion must track both base-3 digit walks.
- The reference above is a scaffold for the on-diagonal case; the full solution requires a 2D digit-DP — see the official editorial.
What Gold February 2021 was actually testing
- P1 — combinatorial games + harmonic enumeration. Recognise the divisibility-chain game collapses to scaled Nim.
- P2 — interval DP with a same-colour merge shortcut. Classical "strokes / piano keys" pattern.
- P3 — fractal counting via base-3 digit DP. Self-similar pattern + per-trit recursion.