7.3Scorecards

A scoring matrix with colored cells and one highlighted top-ranked row

At a Glance

Quantitative In-person Remote Attitudinal

15–30 mins

Free or cheap spreadsheets

In Brief

A scorecard is a weighted ranking tool, usually built in a spreadsheet, that lists your options as rows, your decision criteria as columns, and assigns a numeric score to each combination. You multiply scores by weights and sort by the total to identify one clear top priority. The output is a ranked list that replaces subjective debate with a transparent, repeatable comparison so the team can pick one option and move forward.

Common Use Case

Your team has brainstormed eight possible features to build next and everyone has a different favorite. You need a quick, structured way to compare them against three or four criteria so you can pick one and move forward instead of debating endlessly.

Helps Answer

Which option should we prioritize among several choices?
What criteria matter most for this decision?
How do our options compare when we score them objectively?

Description

As a quick-and-dirty tool for picking one thing to focus on, scorecards are difficult to beat. You list the options as rows, the criteria that matter as columns, weight each criterion, score every option against each criterion, and multiply or sum the values to produce one number per option that ranks the whole list. The named variants — RICE (Reach, Impact, Confidence, Effort), developed at Intercom for product roadmap prioritization; ICE (Impact, Confidence, Ease), popularized by Sean Ellis for ranking growth experiments; and WSJF (Cost of Delay ÷ Job Size), used in SAFe to sequence enterprise backlogs — are all the same shape with different column choices.

Where it shows up. The same template absorbs at least four common decisions: segment prioritization (Justin Wilcox’s SPA analysis ranks customer segments on market size, willingness to pay, and accessibility); feature or backlog ranking (RICE-style scoring against revenue potential, build effort, and confidence); risk prioritization (rank a Diana Kander–style riskiest-assumption list on impact and likelihood); and channel prioritization (rank candidate marketing channels on reach, fit, and cost-to-test before committing budget).

What the score is and isn’t. A scorecard looks quantitative, but the inputs are subjective estimates. The output is not evidence — it’s a structured way to surface judgment that already exists in the room. The primary goal is to prevent “analysis paralysis” by producing one clear top priority so the team can act, then learn, rather than debate further. If you take action quickly, you can always change your mind later. If you don’t take action at all, you won’t generate a result.

How to

Prep

Most of the work in a scorecard happens before any cell gets a number. The decisions you make here — what’s on the list, what counts, how much each thing counts, what scale you’ll score on, who’ll do the scoring — determine whether the result is useful or theatre.

List the options you’ll evaluate as rows. Keep the set tight — five to ten candidates is the sweet spot. If you have thirty, do a fast cut first (a 2x2 risk prioritization or “least important first” pass) before reaching for a scorecard.
Choose three to five criteria as columns. Use the published frameworks as priors: RICE picks Reach, Impact, Confidence, and Effort; ICE picks Impact, Confidence, and Ease; WSJF picks Cost of Delay against Job Size. For a custom decision, name the criteria the way Wilcox names them in SPA — Size, willingness to (P)ay, (A)ccessibility — concrete labels you can actually score.
Decide the scoring rubric. Pick one and use it for every option: 1–5 (forces fewer ties), 1–10 (more headroom but more noise), or a fixed scale like RICE’s Impact column (3 / 2 / 1 / 0.5 / 0.25 — massive / high / medium / low / minimal). Confidence in RICE is a percentage (100% / 80% / 50%), which keeps the team honest about uncertainty without inviting fake precision.
Weight the criteria. Most teams either skip this (every column counts equally) or assign integer multipliers (Impact ×3, Effort ×2, etc.). Either is fine. The pitfall is fiddling with weights after you’ve seen the scores — that’s reverse-engineering the answer you wanted.
Pick the participants. Score with at least two people if the decision matters; one person scoring alone amplifies the halo effect. For team scoring, decide upfront whether you’ll average, take the median, or discuss outliers — disagreement on a row is signal, not noise.

Execution

With the rubric, weights, and participants set, the actual scoring is fast.

Score every option against every criterion using the rubric. Move quickly — your instincts are most of what the scorecard is capturing. If you stall on a cell, mark it and come back; don’t let one cell anchor the rest of the row.
If you’re scoring as a team, do it independently first, then compare. Rows where the team agrees can stand. Rows where two scorers diverge by 3+ points need a 60-second conversation before you average them — that’s where the real judgment lives.
Multiply (or sum, depending on your formula) each row to produce a total score per option.
Sort the totals from highest to lowest. The top row is your candidate top priority.
Sanity-check: if the top option feels obviously wrong, don’t override the scorecard yet — go to Analysis and stress-test the weights and rubric first.

Analysis

A scorecard may generate a result that is counterintuitive. If the top row matches what you wanted to do anyway, the exercise has surfaced and confirmed your intuition — act on it. If the top row is one you didn’t expect, that’s the most useful possible outcome: either the scorecard is wrong (test that next), or your intuition was wrong (act on the new information). The point is to make the disagreement visible.

Two stress-tests are worth running before committing:

Sensitivity to weights. Drop the weight on your highest-weighted criterion by half. Does the top row still win? If yes, the ranking is robust. If a different row jumps to the top, the scorecard is really a one-criterion decision dressed up as a multi-criterion one — name that and decide whether you trust that single criterion.
What’s missing. List what the scorecard does not measure. Strategic fit, founder energy, optionality, second-order effects — these often get dropped because they’re hard to score, not because they don’t matter. If the top row is weak on something the rubric doesn’t capture, that’s a flag, not a tie-breaker.

False precision Scorecards look quantitative, but the numbers are subjective estimates. Don’t confuse a scorecard with a data-backed decision; treat it as a structured way to surface judgment, not as evidence.
Anchoring The first option you score sets a reference point that distorts later scores. Score all options against the same rubric, then revisit your top and bottom rows to sanity-check.
Halo effect A favorite option tends to score high on every criterion regardless of fit. Force yourself to score each criterion independently, ideally with a teammate, before computing totals.
Delegating judgment to AI AI can help brainstorm criteria, suggest weights, and run sensitivity analyses, but the value judgments about what matters most must remain yours. If you let an AI set the weights, you are outsourcing the decision itself and will treat AI estimates as objective when they are not. Use AI to challenge your assumptions and stress-test rankings; own the final weights and the go/no-go call yourself.
Inside-the-building bias Scorecards rank options using the team’s existing knowledge, which can let you act without ever leaving your desk. Pair the ranking with at least one external data point or assumption-validating conversation before committing real resources.
Diluted-priority bias There can only be one top priority at any given time. Treating the top two or three rows as equally urgent guarantees that none of them gets the focus a single priority would; pick the top row and sequence the rest behind it.

Next Steps

Score all candidates and review the top-ranked items as a team.
If rankings feel wrong, revisit the criteria weights rather than manually overriding scores.
Use the scorecard to communicate prioritization rationale to stakeholders.
Revisit and update criteria weights quarterly as your strategy evolves.
Use Customer Discovery Interviews to validate that the criteria in your scorecard reflect what customers actually care about.
Run a Closed-Ended Survey to gather quantitative data that can replace subjective estimates in your scorecard.

Learn more

Case Studies

Intercom: RICE scoring

Sean McBride and the Intercom team published RICE (Reach, Impact, Confidence, Effort) in January 2018 as a weighted scorecard for roadmap prioritization, replacing intuition-driven debate with a consistent formula for comparing ideas.

Sean Ellis: ICE scoring

Sean Ellis introduced ICE (Impact, Confidence, Ease) to rank growth experiments, with each criterion scored 0–10 and combined to a single number. Itamar Gilad’s writeup flags the main weakness as upkeep — “it requires persistence and repetition” — because the value comes from running it consistently.

SAFe: Weighted Shortest Job First

The Scaled Agile Framework uses WSJF (Cost of Delay ÷ Job Size) to sequence enterprise backlogs. Cost of Delay aggregates user-and-business value, time criticality, and risk reduction / opportunity enablement, with the framework crediting Don Reinertsen’s Principles of Product Development Flow for the underlying logic.

The Real Startup Book

7.3Scorecards

At a Glance

In Brief

Common Use Case

Helps Answer

Description

How to

Prep

Execution

Analysis

Next Steps

Case Studies

Further reading

7.3Scorecards

At a Glance

In Brief

Common Use Case

Helps Answer

Description

How to

Prep AI Prompt

Execution

Analysis AI Prompt

Next Steps

Case Studies

Further reading

Prep

Analysis