USACO 2021 February · Gold · all three problems

← Full February 2021 set · Official results

Source. Statements and constraints below are paraphrased from usaco.org/index.php?page=feb21results. Each problem heading links to the canonical statement on usaco.org. Code is my own reference C++ — cross-check the official editorial before trusting it under contest pressure.

Problem 1 — Stone Game

Statement

Two players play with N piles of stones (sizes a₁..a_N). Bessie picks a positive integer s₁ and removes s₁ stones from some pile. Each subsequent move picks s_i that is a positive multiple of the previous s_i-1 (i.e. s_i-1 | s_i) and removes s_i stones from some pile that has at least that many. The first player unable to move loses. Count Bessie's winning first moves (i.e. the number of pairs (pile, s₁)).

Constraints

1 ≤ N ≤ 10⁵.
1 ≤ a_i ≤ 10⁶.
Subtasks: tests 3–5 N = 2; tests 6–10 N, a_i ≤ 100; tests 11–20 full constraints.
Time limit: 2s.

Key idea

Once a move s has been made, all future moves must be multiples of s, so they reduce to "Nim where each pile has size floor(a_i / s) and moves remove any positive integer number of s-blocks from one pile". With one stone unit re-scaled to s, this is plain Nim with pile sizes ⌊a_i/s⌋. The XOR after Bessie's first move (pile p, value s, with s | s and using k blocks of s where s₁ = k · s for some k≥1 — equivalently any positive integer multiple of s; but the first move just chooses any s₁, then later moves are constrained to multiples of s₁) means we enumerate s = s₁, the pile, and count s₁ values for which the resulting Nim XOR is 0. Use harmonic enumeration over s = 1..max(a); for each s, compute Nim XOR of ⌊a_i/s⌋ across piles, then count how many ways Bessie's first move achieves a losing position for Elsie.

Complexity target

O((max a) log (max a) + N · log) using harmonic sums and a frequency array. Roughly 10⁷.

Reference solution (C++)

#include <bits/stdc++.h>
using namespace std;
using ll = long long;

// Sketch: enumerate s = first-move size; the rest of the game is Nim with
// pile sizes floor(a_i / s). Count (pile p, s) such that
//   xor_{i != p}(floor(a_i/s)) == floor(a_p/s) - k for some valid k >= 1
// then map (p, s, k) back to (p, s_1 = k*s).
//
// This is a reference scaffold — the exact counting per s uses a small
// frequency table of floor(a_i/s) values, which has at most O(sqrt(max a))
// distinct buckets across all piles.

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    int N; cin >> N;
    vector<int> a(N);
    int M = 0;
    for (auto& x : a) { cin >> x; M = max(M, x); }

    // freq[v] = how many piles have value v.
    vector<int> freq(M + 1, 0);
    for (int v : a) ++freq[v];

    ll answer = 0;
    // For each s in 1..M, compute Nim XOR over floor(a_i / s) of all piles.
    // We use a small bucketed sum: for each block [k*s, (k+1)*s) values map to k.
    for (int s = 1; s <= M; ++s) {
        int totalXor = 0;
        // Walk blocks k = 0, 1, 2, ... with k * s <= M.
        for (int k = 0, lo = 0; lo <= M; ++k, lo += s) {
            int hi = min(M, lo + s - 1);
            int cnt = 0;
            for (int v = lo; v <= hi; ++v) cnt += freq[v];
            if (cnt & 1) totalXor ^= k;
        }
        // Bessie picks a pile p with value a_p and a removal k_p*s for some k_p >= 1.
        // After the move pile p has floor((a_p - k_p*s) / s) blocks left, others
        // contribute their floor(a_i/s). Bessie wants the resulting XOR to be 0.
        for (int v = 1; v <= M; ++v) if (freq[v]) {
            int kp = v / s;                  // current block count for that pile
            int xorOthers = totalXor ^ kp;   // XOR of the other piles
            // Bessie wants new block count = xorOthers, with new < kp (she must
            // remove at least s stones).
            if (xorOthers < kp) answer += (ll)freq[v];
        }
    }
    cout << answer << "\n";
}
// NOTE: this is the structural skeleton; the inner cnt-loop should be replaced
// with prefix sums of freq for full speed. See editorial for the optimal
// O((max a) log(max a)) form. [verify] cpid=1113

Pitfalls

The game is Nim once s is fixed — that's the only non-obvious observation. Don't try Grundy from scratch.
"s_i must be a positive multiple of s_i-1" means the move size never decreases; the divisibility chain is the whole trick.
The skeleton above is correct in spirit but not optimised. For full credit, replace the inner counting loop with prefix sums and the harmonic-series bound.
Read the editorial for the polished counting — this is one of the harder Gold P1s of the season.

Problem 2 — Modern Art 3

Statement

Given an array of N colours (each in [1, N]), compute the minimum number of "interval strokes" needed to produce it from a blank canvas, where a stroke paints a contiguous interval [l, r] with a single colour and a later stroke can overwrite an earlier one.

Constraints

1 ≤ N ≤ 300.
Colours are integers in [1, N].
Subtasks: tests 2–4 only colours 1, 2 appear; tests 5–10 each cell's colour stays within a 12-cell aligned block; tests 11–20 full.
Time limit: 2s.

Key idea

Interval DP. Let dp[l][r] = minimum strokes to paint a[l..r]. If a[l] == a[r], one of the boundary strokes can extend across, so dp[l][r] = dp[l][r−1] (equivalently dp[l+1][r]). Otherwise, split: dp[l][r] = 1 + min over l ≤ k < r of (dp[l][k] + dp[k+1][r] − [a[k] == a[r] or a[l] == a[k+1]]). The classic shortcut is dp[l][r] = min(dp[l][r−1] + 1, min_{k: a[k]==a[r]} dp[l][k−1] + dp[k][r−1]) — pay 1 for a fresh stroke covering r, or merge with an earlier same-colour stroke.

Complexity target

O(N³) = 2.7 · 10⁷, fast.

Reference solution (C++)

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    int N; cin >> N;
    vector<int> a(N);
    for (auto& x : a) cin >> x;

    // dp[l][r] = min strokes to paint a[l..r].
    vector<vector<int>> dp(N, vector<int>(N, 0));
    for (int i = 0; i < N; ++i) dp[i][i] = 1;

    for (int len = 2; len <= N; ++len) {
        for (int l = 0; l + len <= N; ++l) {
            int r = l + len - 1;
            // Default: stroke for a[r] alone, costing 1 atop dp[l][r-1].
            dp[l][r] = dp[l][r - 1] + 1;
            // Merge: any earlier same-colour position k can share its stroke with r.
            for (int k = l; k < r; ++k) {
                if (a[k] == a[r]) {
                    int candidate = dp[l][k] + (k + 1 <= r - 1 ? dp[k + 1][r - 1] : 0);
                    dp[l][r] = min(dp[l][r], candidate);
                }
            }
        }
    }
    cout << dp[0][N - 1] << "\n";
}

Pitfalls

"Strokes" are not "distinct colour blocks". One colour can be painted as one long stroke later partially overwritten — the DP handles this via the merge case.
Two equivalent recurrences exist. The "merge with earlier same colour" form is the cleanest; the "split at every k" form is correct but slightly slower constants.
Base case for single cells is 1, not 0. A blank canvas needs one stroke even for one cell.
Don't try greedy. Several greedy heuristics look right and fail on simple counterexamples.

Problem 3 — Count the Cows

Statement

Define a 2D pattern: cell (x, y) contains a cow iff for every integer k ≥ 0, the values ⌊x/3^k⌋ mod 3 and ⌊y/3^k⌋ mod 3 have the same parity (both = 1, or both ∈ {0, 2}). For each of Q queries (x, y, d), count cows on the diagonal {(x + t, y + t) : 0 ≤ t ≤ d}.

Constraints

1 ≤ Q ≤ 10⁴.
0 ≤ x, y, d ≤ 10¹⁸.
Time limit: 2s.

Key idea

The cow-pattern is the 3-adic Sierpinski analogue. Let f(L) = number of cows on the main diagonal of the L × L block at the origin where L = 3^m; one shows f(3^m) = 7 · f(3^m−1) by case-splitting (the centre sub-block plus six off-diagonal sub-blocks). For a general diagonal segment (x, y) → (x + d, y + d), use digit-DP in base 3: at each level partition the range by the trit of x XOR y; recurse only into sub-blocks where the parity-rule survives. The query answers in O(log₃(max)).

Complexity target

Per query O(log₃(10¹⁸)²) ≈ 1500 ops; total ≈ 1.5 · 10⁷.

Reference solution (C++)

#include <bits/stdc++.h>
using namespace std;
using ll = long long;
using u128 = __int128;

// pow3[i] = 3^i. cows[i] = # cows on the main diagonal of a 3^i x 3^i block.
ll pow3[40];
ll cows[40];

// count(d) = number of cows on the diagonal (0,0) -> (d-1, d-1) inside the
// 3^L x 3^L block whose origin is at (0,0). Assumes 0 <= d <= 3^L.
ll countDiag(ll d, int L) {
    if (d <= 0) return 0;
    if (L == 0) return 1;     // 1x1 block, cow always present (rule is vacuous)
    ll step = pow3[L - 1];
    // Three on-diagonal sub-blocks at trits (0,0), (1,1), (2,2). Only (1,1)
    // is forbidden (1 is odd, but odd-odd has same parity, so it IS allowed).
    // Per the rule both being 1 has same parity (odd), so all three on-diagonal
    // sub-blocks are alive.
    ll ans = 0;
    for (int t = 0; t < 3 && d > 0; ++t) {
        ll take = min(d, step);
        ans += countDiag(take, L - 1);
        d -= take;
    }
    return ans;
    // NOTE: this counts the (x,y) = (t,t) diagonal of the L-block at origin.
    // For a general (x,y,d) query, decompose by trits of x XOR y at each level
    // and recurse into the matching sub-block path.  See editorial for the
    // full off-diagonal version. [verify] cpid=1115
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    pow3[0] = 1; for (int i = 1; i < 40; ++i) pow3[i] = pow3[i - 1] * 3;
    cows[0] = 1; for (int i = 1; i < 40; ++i) cows[i] = cows[i - 1] * 3;  // diagonal-only count

    int Q; cin >> Q;
    while (Q--) {
        ll x, y, d; cin >> x >> y >> d;
        // Reference: handle origin-diagonal query only. Full solution shifts
        // coordinates by trits and walks the base-3 representation of (x XOR y).
        cout << countDiag(d + 1, 39) << "\n";
    }
}

Pitfalls

"Same parity" means both ∈ {0, 2} (even) or both = 1 (odd). The middle trit being 1 is allowed when the partner trit is also 1.
10¹⁸ needs 64-bit and 3³⁸ ≈ 1.35 · 10¹⁸. Precompute 3^0..39 in long long; watch for overflow on the highest power.
Off-diagonal queries (x ≠ y). The diagonal can leave the current sub-block at different times for x vs y; the recursion must track both base-3 digit walks.
The reference above is a scaffold for the on-diagonal case; the full solution requires a 2D digit-DP — see the official editorial.

What Gold February 2021 was actually testing

P1 — combinatorial games + harmonic enumeration. Recognise the divisibility-chain game collapses to scaled Nim.
P2 — interval DP with a same-colour merge shortcut. Classical "strokes / piano keys" pattern.
P3 — fractal counting via base-3 digit DP. Self-similar pattern + per-trit recursion.