5.4Competitor Usability Testing

An observer with a clipboard watching a user complete tasks on a laptop

At a Glance

Qualitative In-person Remote Observational

~1 week

~$1K

In Brief

Competitor usability testing is a generative research method where you watch target users complete tasks using a competitor’s product. You observe where they struggle, what workarounds they invent, and which features they ignore. The output is a set of qualitative insights — pain points, unmet needs, and unnecessary features — that directly inform the design of your own product. This is not the same as competitive usability testing, which benchmarks your existing product against rivals to see which is “winning.” See Usability Testing.

Common Use Case

You are designing a new product and want to learn where users struggle with the existing alternatives before you commit to your own design. You recruit a few target users, watch them complete real tasks in a competitor’s product, and note every point of confusion, workaround, or complaint. The gaps you uncover tell you where your solution can differentiate.

Helps Answer

What is the minimum feature set needed to solve the problem?
How important is design quality to users in this category?
Where do users struggle most with existing products?
Which competitor features are unnecessary or confusing?

Description

The process of conducting competitor usability tests is the same as when testing your own product. You apply the same methodology to a different purpose: observing real users on a competitor’s product, or on a substitute good, instead of on your own.

Where usability testing is an evaluative test of your own product and seeks to verify that it functions sufficiently to deliver the value proposition, competitor usability testing is a generative method. The goal is to surface ideas for your solution by watching where the existing alternative fails its users.

For example, to generate ideas on how to create a better U.S. tax experience, you could conduct usability testing on tax preparation in Sweden or India. The results would not tell you whether the U.S. tax experience is good (it is not), but they may give you ideas around whether to improve the comprehensibility of the tax code, the submission process, or the tax rules themselves.

LLMs can accelerate the desk research that precedes a session. Synthesizing app store reviews, support forums, and social media complaints about a competitor will surface candidate usability issues you can prioritize when you design your task scenarios. Treat that output as preparation for what to watch for, not as a substitute for watching a real user fail at a task.

How to

Prep

Pick 2–3 competitor products or substitutes to test. Choose the ones your target customers are most likely using today. Include at least one indirect substitute — a different type of product that solves the same problem differently. Industry guidance from Nielsen Norman Group on competitive usability evaluations recommends keeping the count to four or fewer; beyond that, the analysis cost grows faster than the insight does.
Write 4–6 task scenarios. Each task should represent a core workflow your target customer does regularly. Use the same tasks across all competitors so you can compare. Frame each as a goal, not a click path: “You need to [accomplish goal]. Use [Competitor A] to do it.” Steve Krug’s working rule — keep the scenario short, specific, and free of jargon the participant wouldn’t use — applies here as well as on your own product (Krug, Don’t Make Me Think, Revisited).
Recruit 5 participants per competitor. They should match your target customer profile. If a participant already uses one of the products, that is fine — you will learn where experienced users still struggle. Five is the standard threshold at which most usability problems on a single product surface; Jakob Nielsen’s foundational work on small-sample testing remains the reference (Nielsen, Usability Engineering).
Set up the test environment. Decide moderated vs. unmoderated, in-person vs. remote, and pick the tool to match (see Tools below). For unmoderated remote tests, pre-record a short framing message; for moderated, draft a one-paragraph script. Keep the framing identical across competitors so participants approach each product the same way.
Draft consent and recording language. Get explicit permission to record. Make clear you are testing the products, not the participant. Tell them they can quit a task or the whole session at any time without explanation.

Execution

Run the test. For each participant and each product:
- Give them the task. Don’t explain how the product works.
- Watch them attempt the task. Note where they hesitate, make errors, express frustration, or find workarounds.
- After each task, ask: “What was confusing? What would you change?”
- After all tasks, ask: “What do you wish this product did differently?”
Record and compare. For each product, track: task completion rate, time on task, number of errors, and participant satisfaction. The gaps and frustrations you observe are your opportunity — they reveal what your product should do better.

For the full usability testing methodology, see Usability Testing. The process is the same; the difference is that you are testing someone else’s product to generate ideas, not evaluating your own.

Analysis

Treat results as generative, not evaluative. A failed task on a competitor’s product is a candidate problem to solve, not a verdict on the competitor. The ideas tend to be unstructured and piecemeal — you will need to integrate them into a coherent solution rather than ship them one-for-one.
Cluster failures into opportunity areas. Group every observed pain point, workaround, and unmet need across participants and products. The clusters with the most repeats across participants are your highest-confidence opportunities. A pain point that appeared once is an anecdote; one that appeared in four of five sessions across two competitors is a pattern.
Validate before building. Before committing engineering effort to any opportunity, test it via a generative product method such as a Solution Interview or Concierge Test. Tomer Sharon’s running guidance is to keep validating the problem framing for as long as it is cheap to do so (Sharon, Validating Product Ideas); the same applies to opportunities surfaced from competitor sessions. Dan Olsen’s product-market fit framing — problem space before solution space — gives you the discipline to resist building a feature just because a competitor lacks it (Olsen, The Lean Product Playbook).

Hawthorne effect (the observer effect) Participants behave differently when they know they are being watched. Frame the session as testing the product, not the person, and resist coaching mid-task.
Confirmation bias Researchers sometimes write tasks or post-task questions in a way that confirms what they already believe about the competitor. Have a second person review your task list before you run sessions.
AI analysis substitution A model-generated summary of session recordings is not a substitute for watching the sessions. Watch at least the first three sessions yourself before relying on any tooling to surface the rest.
Solve real failures, not imagined ones Don’t reinvent the wheel; figure out what’s wrong with walking (@TriKro).

Next Steps

Map the most common usability failures to specific feature opportunities in your product.
Use competitor pain points in your marketing messaging (“Unlike X, we…”).
Test your own product or prototype with the same tasks to benchmark against competitors.
Prioritize building features that address the highest-frustration competitor workflows.
Use a Solution Interview to propose your alternative solution and gauge whether it addresses the competitor pain points you discovered.
Run Usability Testing on your own prototype using the same tasks to verify your design actually outperforms the competition.

Learn more

Case Studies

American Airlines: Comparative QXscore testing

American Airlines worked with UserTesting to score four customer journeys (booking, flight changes, check-in, AAdvantage) against the same flows on competing airlines, surfacing friction like below-the-fold multi-city search; redesigns produced a 37% lift in task completion and a 15% average QXscore increase.

Loop11: Ten-airline banned-baggage benchmark

Loop11 recruited 1,000 participants (100 per site) to attempt the same banned-baggage lookup task across ten airline websites; British Airways completed at 71% in 87 seconds, while Malaysia Airlines completed at 31% and Virgin Atlantic took 199 seconds — more than twice as long as BA.

Jakob Nielsen, Usability Engineering. Academic Press, 1993. ISBN 978-0125184052. The foundational text for the small-sample testing convention used here.
“Competitive Usability Evaluations: Definition”
Steve Krug, Don’t Make Me Think, Revisited (3rd ed.). New Riders, 2014. ISBN 978-0321965516. Practical guidance on writing usability test tasks that produce real signal.
Tomer Sharon, Validating Product Ideas: Through Lean User Research. Rosenfeld Media, 2016. ISBN 978-1933820293. Lean research framing for treating session output as hypotheses to validate.
Dan Olsen, The Lean Product Playbook. Wiley, 2015. ISBN 978-1118960875. Problem-space-before-solution-space discipline for opportunities surfaced from competitor sessions.
Daniël De Wit: Don’t feel bad, user test your competitors

Got something to add? Share with the community.

The Real Startup Book

5.4Competitor Usability Testing

At a Glance

In Brief

Common Use Case

Helps Answer

Description

How to

Prep

Execution

Analysis

Next Steps

Case Studies

Further reading

5.4Competitor Usability Testing

At a Glance

In Brief

Common Use Case

Helps Answer

Description

How to

Prep AI Prompt

Execution

Analysis AI Prompt

Next Steps

Case Studies

Further reading

Prep

Analysis