7.3 Scorecards

A scoring matrix with colored cells and one highlighted top-ranked row

At a Glance

~2–4 hours~2–4 hours AI proposes candidate criteria, suggested weights, and a scoring rubric, and drafts the option research that feeds the scores. What remains is a short structured-judgment session: the team agrees on what matters and applies the scores. Plan on an afternoon, not days.
$0$0 A scorecard has no out-of-pocket cost. It lives in a free spreadsheet, and AI drafts the criteria and the supporting research that informs each score. The only input is your team’s judgment in the scoring session — no tooling, media, or participant compensation.

Other names Weighted Scorecard · Prioritization Scorecard

In Brief

A scorecard is a weighted ranking tool, usually a spreadsheet, that lists your options as rows and your decision criteria as columns, scores each option against each criterion, then weights and sums the scores to surface one clear top priority. The output is a ranked list that replaces circular debate with a transparent, repeatable comparison, so the team can commit to one option and move.

Common Use Case

Your team has brainstormed eight possible features to build next and everyone has a different favorite. You need a quick, structured way to compare them against three or four criteria so you can pick one and move forward instead of debating endlessly.

Helps Answer

  • Which option should we prioritize among several choices?
  • What criteria actually matter for this decision?
  • How do our options compare when we score them the same way?

Description

A scorecard lists your options as rows, the criteria that matter as columns, weights each criterion, scores every option against each criterion, and sums (or multiplies) the values into one number per option that ranks the whole list.

Unlike the research and experiment methods in this book, a scorecard generates no new evidence about your market or product. It is a decision-making tool, not a research method: it organizes judgment you already hold and makes visible the option that judgment favors. Its inputs are usually the generative output of other methods — the segments surfaced by Discovery Interviews, the features ranked in a prioritization game, the risks named in an assumptions map. The scorecard’s job is to choose among what those methods surfaced, not to discover anything new. Treat its output as structured judgment made visible, never as data.

The named variants are the same shape with different columns: RICE (Reach, Impact, Confidence, Effort) for roadmap prioritization; ICE (Impact, Confidence, Ease) for ranking growth experiments; and WSJF (Cost of Delay ÷ Job Size) for sequencing enterprise backlogs. The same template absorbs at least four common decisions: ranking segments (size, willingness to pay, accessibility), features or backlog items (impact, effort, confidence), risks (impact and likelihood), and channels (reach, fit, cost-to-test). That cross-cutting reach is why a scorecard is a general decision tool rather than a method tied to one stage of validation.

Because the inputs are subjective estimates, the output looks quantitative but is not evidence. Its primary value is preventing analysis paralysis: it produces one clear top priority so the team can act, then learn, rather than debate further. If you act quickly you can change your mind later; if you never act, you generate no result to learn from.

How to

Prep

Most of the work happens before any cell gets a number. What’s on the list, what counts, how much each thing counts, what scale you score on, and who scores — these decisions determine whether the result is useful or theatre.

  1. List the options as rows. Keep the set tight — five to ten candidates is the sweet spot. If you have thirty, make a fast cut first (a 2×2 impact/effort pass, or a “least important first” sweep) before reaching for a scorecard.
  2. Choose three to five criteria as columns. Borrow from the published frameworks as priors — RICE uses Reach, Impact, Confidence, and Effort; ICE uses Impact, Confidence, and Ease; WSJF weighs Cost of Delay against Job Size — or name your own concrete, scoreable criteria. AI can propose a candidate criteria set from a description of your decision; treat it as a first draft to edit, not the answer.
  3. Decide the scoring rubric and use it for every option: 1–5 (forces fewer ties), 1–10 (more headroom, more noise), or a fixed scale with named anchors (massive / high / medium / low). Express confidence as a percentage to keep the team honest about uncertainty.
  4. Weight the criteria. Either count every column equally or assign integer multipliers (Impact ×3, Effort ×2). The pitfall is changing weights after you see the scores — that reverse-engineers the answer you wanted, so lock the weights now.
  5. Pick the scorers. Score with at least two people if the decision matters; one scorer alone amplifies the halo effect. Decide upfront whether you’ll average, take the median, or discuss outliers — disagreement on a row is signal, not noise.

Execution

With the rubric, weights, and scorers set, the scoring itself is fast.

  1. Score every option against every criterion using the rubric. Move quickly — your instinct is most of what the scorecard captures. If you stall on a cell, mark it and come back; don’t let one cell anchor the rest of the row.
  2. If you’re scoring as a team, score independently first, then compare. Rows where scorers agree stand. Rows where two scorers diverge by three or more points get a 60-second conversation before you reconcile them — that gap is where the real judgment lives.
  3. Apply your formula (sum, or weighted multiply) to produce one total per option.
  4. Sort the totals from highest to lowest. The top row is your candidate priority.
  5. If the top option feels obviously wrong, don’t override it yet — take it to Analysis and stress-test the weights first.

Analysis

A scorecard earns its keep when the result is counterintuitive. If the top row is what you’d have picked anyway, it has surfaced and confirmed your judgment — act on it. If the top row surprises you, that’s the most useful possible outcome: either the scorecard is wrong (test that now) or your intuition was. Run two stress-tests before you commit:

  • Sensitivity to weights. Halve the weight on your highest-weighted criterion. If the top row still wins, the ranking is robust. If a different row jumps to the top, the scorecard is really a one-criterion decision dressed up as a multi-criterion one — name that, and decide whether you trust that single criterion. AI can run this sweep across every criterion in seconds.
  • What’s missing. List what the rubric does not measure — strategic fit, optionality, irreversibility, second-order effects. These get dropped because they’re hard to score, not because they don’t matter. If the top row is weak on something the rubric ignores, that’s a flag, not a tiebreaker.
Biases & Tips
  • False precision A scorecard looks quantitative, but the numbers are subjective estimates. Don’t mistake a total for a data-backed result; it is judgment made legible, not evidence.
  • Anchoring The first option you score sets a reference point that distorts the rest. Score every option against the same rubric, then revisit your top and bottom rows to sanity-check.
  • Halo effect A favored option tends to score well on every criterion regardless of fit. Score each criterion independently, ideally with a second scorer, before you compute any totals.
  • Delegating judgment to AI AI can brainstorm criteria, suggest weights, and run sensitivity sweeps, but the value judgments about what matters most must stay yours. Hand the weights to an AI and you have outsourced the decision while dressing it as objective. Use AI to challenge your assumptions; own the weights and the go/no-go call.
  • Inside-the-building bias A scorecard ranks options using the knowledge already in the room, which lets you decide without ever testing reality. Pair the ranking with at least one external data point or validating conversation before committing real resources.
  • Diluted-priority bias There is only one top priority at a time. Treating the top two or three rows as equally urgent guarantees none of them gets the focus a single priority would; pick the top row and sequence the rest behind it.

Next Steps

  • Treat the top-ranked option as a hypothesis, not a verdict — run the experiment that tests it before you commit fully (a Landing Page Test for a feature’s demand, a Concierge Test for a service).
  • If the scores leaned on guesses, replace them with evidence: run a Closed-Ended Survey to put real numbers behind the criteria you estimated.
  • If you’re unsure the criteria reflect what customers value, validate them with Customer Discovery Interviews before you re-score.
Learn more

Case Studies

Intercom — RICE to end roadmap debates

Facing loudest-voice-wins roadmap arguments, Intercom’s product team scored every candidate feature on Reach, Impact, Confidence, and Effort so ideas were compared on one consistent formula instead of by intuition; they published the RICE scorecard in 2018.

Read more

Sean Ellis — ICE to sequence growth experiments

To decide which growth bets to run first at high tempo, Sean Ellis scored each idea 0–10 on Impact, Confidence, and Ease and ran them in ranked order; the payoff comes from applying the scorecard consistently, wave after wave.

Read more

Scaled Agile (WSJF) — sequencing enterprise backlogs

Teams using SAFe decide build order by scoring each job’s Cost of Delay against its size (WSJF), so the highest-value, lowest-effort work ships first rather than whatever was loudest in planning.

Read more

Got something to add? Share with the community.