Usability Testing

An observer with a clipboard watching a user interact with a laptop

In Brief

Usability testing is a qualitative method where you watch real users attempt specific tasks with your product and record what happens. The product can be anything from a paper sketch to a fully built application. You give each participant a task, observe whether they can complete it, and note where they hesitate, get lost, or express frustration. The output is a prioritized list of usability problems — the points in your design where real people struggle — along with insight into why those problems occur.

Common Use Case

You have built or redesigned a key flow in your product and want to see whether real users can complete it without getting lost or giving up. You sit with a handful of participants, give them a task, and watch where they stumble so you can fix the friction before you ship.

Helps Answer

  • How do people actually use this product or feature?
  • Can users complete the key tasks without help?
  • Where do users get confused or frustrated?
  • What do users experience at each step of the process?
Usability tests with 5 users can be finished in half a working day with minimal resources. Tests typically require no more than 5–7 users, unless the tasks are complex and involve several parties collaborating simultaneously to complete the test. Tests are often performed with extensive equipment, including full usability labs with cameras, eye-tracking software, and one-way mirrors (although this is not strictly necessary). AI-powered tools can auto-generate session transcripts, highlight moments of frustration, and produce summary reports, reducing analysis time from hours to minutes.
Usability testing can be done at low cost with just a laptop, a note-taking template, and a quiet room. Remote unmoderated tools like Maze and Lyssna offer free tiers. Full usability labs with eye-tracking and screen recording are available but not required for most early-stage tests.

Description

Usability testing is a qualitative method for observing real people as they attempt realistic tasks with your product. ISO 9241-11:2018 defines usability as the extent to which a system, product or service can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use. The test makes that definition operational: you put a representative user in front of the artifact, give them a goal, and watch where reality diverges from the design intent. Jakob Nielsen’s foundational Usability Engineering established that you do not need a large sample to find most usability problems — five carefully chosen testers, run iteratively, will surface the dominant defects in a flow.

The method works at any fidelity. A paper sketch tested with three people can expose a confusing label as effectively as a fully built product tested with twenty. Steve Krug’s Don’t Make Me Think, Revisited makes the case for treating usability testing as an everyday DIY activity rather than a heavy lab exercise: small, frequent tests done by the team that built the product close the loop faster than commissioning quarterly studies. Usability testing is qualitative — it tells you what breaks and why — and complements quantitative methods (analytics, A/B tests, surveys) that tell you how often.

AI now sits inside the workflow rather than alongside it. Platforms like Maze, Lookback, Lyssna, and UserTesting auto-generate session transcripts, flag moments of hesitation or frustration from screen and audio signals, cluster patterns across sessions, and produce draft summaries. This compresses analysis time from hours to minutes and makes it practical to run more tests with more users. AI does not replace observation — it accelerates synthesis. A flagged frustration moment still needs a human to decide whether the design is wrong or the task scenario was unfair, and the empathy gain that comes from watching a user struggle live is something a transcript summary cannot reproduce.

How to

Prep

  1. Define the task scenarios. Write 3–5 realistic tasks that cover the critical paths in your product. Each task is a goal, not a click-by-click instruction. Example: “You just signed up. Find the feature that lets you invite a teammate.” Avoid leading the user toward the answer. Set a reasonable time limit per task (2–5 minutes).
  2. Recruit 5 testers who match your target user. Five is the heuristic Nielsen established in Usability Engineering — past five, you mostly re-confirm problems you’ve already found. Recruit from existing users, social channels, or panels (UserTesting, Lyssna, Prolific). For B2B, ask customers directly; most will say yes if you make it short.
  3. Decide moderated vs unmoderated, then pick the tool. Moderated sessions (Zoom, Lookback, in-person) let you ask “what made you click that?” in real time but cost more researcher hours. Unmoderated sessions (Maze, Lyssna, UserTesting) scale cheaply and AI-assisted platforms can analyze think-aloud audio and behavioral signals across many sessions without a researcher watching each one in real time. Mix both when budget allows: a few moderated sessions to build empathy, a larger unmoderated batch to confirm patterns.
  4. Write a short intro script. Two or three sentences that frame the session. The standard reassurance: “We’re testing the product, not you. If you get stuck, that’s exactly the feedback we need.” Pre-writing the script keeps facilitators consistent across sessions.
  5. Pilot once internally before the first real session. Run the full protocol with a teammate. You’re checking whether tasks are intelligible, whether the recording setup works, and whether the timing fits. Half the issues you find in piloting would have wasted a real participant slot.
  6. Decide who moderates and who takes notes. One person facilitates and asks the questions; a second person observes and captures notes. Trying to do both at once degrades both. For unmoderated tests, the “observer” role becomes whoever reviews the AI-generated session summaries and watches the flagged moments.

Execution

  1. Frame the session. The facilitator reads the intro script: explain the purpose, reassure the participant that any difficulty is feedback about the product (not the user), and confirm consent to record. A pre-written script is what keeps consistency across facilitators and sessions.
  2. Explain the first task. Read the task scenario aloud and hand control to the user. Give context, not instructions. Do not tell them where to click or what to look for.
  3. Observe and ask the user to think aloud. Ask the participant to narrate impressions, intentions, and expectations as they work. The think-aloud protocol is what makes usability testing diagnostic rather than just a pass/fail score — it surfaces the why behind the behavior. Do not explain, coach, or interpret. Interject only to ask “what are you thinking right now?” when the user goes silent at a moment of hesitation.
  4. Capture the session. Record audio, screen, and (where available) video. For unmoderated remote sessions, AI-assisted platforms (Maze, Lookback, UserTesting, Lyssna) auto-flag moments of hesitation, backtracking, repeated clicks, and emotional cues — useful triage when you cannot watch every session in real time.
  5. Repeat for each task. Move through the rest of the task scenarios. Watch for fatigue; if the participant is flagging, cut a task rather than push through with degraded data.
  6. Run a short exit interview. Thank the user and ask 2–3 open-ended follow-up questions to clarify their experience: “What was the most frustrating part?” “If you could change one thing, what would it be?” These prompts often surface issues the user did not articulate during the task itself.

Analysis

  1. Synthesize observation notes across sessions. The facilitator and any observers compare notes, focusing on moments where users hesitated, backtracked, or expressed frustration. Even usability experts sometimes disagree on interpretation, so multiple observers reduce the chance that a single experimenter’s bias filters the findings. Identify the functional issues that affected most or all participants — those are the ones worth fixing first.
  2. Prioritize by frequency and severity. Given the small sample size, treat consistent problems (3+ of 5 users hit the same wall) as confirmed defects. One-off issues might be real or might be participant-specific — note them but do not over-weight them. Severity is independent of frequency: a problem that happens once but blocks the user from completing the task scores higher than a cosmetic issue that everyone hit.
  3. Use AI tools to triage what to watch closely. AI-powered session analysis tools (Maze, Lookback, Lyssna) auto-flag moments of confusion, generate heatmaps, calculate task-completion metrics, and produce session summaries. Treat these as a multiplier on researcher bandwidth — use them to triage which recordings to watch closely, not as a substitute for watching sessions yourself. Automated heuristic checks (contrast ratios, touch-target sizes, navigation depth) can also catch surface-level issues before you spend a real participant slot on them.
  4. Distinguish usable from desirable. If all users complete the tasks, that tells you the product is usable. It does not tell you whether the value proposition lands or whether anyone would actually pay for it. Usability testing is a necessary but insufficient validation step — pair it with desirability and value-proposition tests before you assume you are clear to ship.
Biases & Tips
  • Hawthorne effect (the observer effect) Users behave differently when they know they’re being watched. Frame the session as testing the product, not the user, and use unmoderated remote tests for a less observed signal where appropriate.
  • Social desirability bias Users may try to complete tasks (or answer questions) in a way that makes them look good to the experimenter. Reinforce that struggling is the desired output, not a failure.
  • Confirmation bias Experimenters can frame tasks or questions in ways that confirm their preconceptions. Have someone outside the design team review the task scenarios before you run sessions.
  • Selection bias Testing usability only with existing power users hides the issues new users hit. Recruit a mix that matches the segment you’re actually trying to serve.
  • AI analysis is a multiplier, not a replacement Automated tools catch surface-level issues and flag frustration signals, but they cannot tell you whether the product solves the user’s problem in a way that feels natural. Make sure at least some sessions are observed live by a team member who can ask follow-up questions and notice cues that AI still misses.

Next Steps

  • Prioritize the most critical usability issues by frequency and severity.
  • Fix the top issues and re-test with new participants to verify improvements.
  • Run Competitor Usability Testing to compare your usability scores against direct competitors.
  • Share highlight clips with stakeholders to build empathy for user struggles.
  • Use A/B Testing to measure whether usability fixes translate into improved conversion or engagement metrics at scale.
  • Run a Net Promoter Score Survey after shipping usability improvements to track whether satisfaction increases over time.
Learn more

Case Studies

Automated Usability Testing

A Case Study

Read more

Maryellen Allen — A Case Study of Usability Testing of the University of South Florida’s Virtual Library Interface Design
Metisa

A Usability Case Study

Read more

Pinterest — A usability case study
Zara

A Usability Case Study

Read more

Maze — AI-Powered Usability Testing Platform

Maze publicly reports that more than 60,000 teams use the platform. In April 2024, the company launched its Feedback Engine, applying AI to analyze open-ended survey questions at scale; auto-generated transcripts, summaries, and thematic sentiment filters reduce researcher overhead, and dynamic follow-up questions adapt to participant responses.

Read more

Maze — The Future of User Research Report 2026

Maze’s annual industry report documents trends in AI-assisted usability testing, including how AI-generated test templates allow researchers to launch tests in a fraction of the time previously required, and how heatmaps and completion-time analysis help teams identify drop-off points in user flows.

Read more

Figma Make — AI Prototyping for Usability Testing

Figma launched Make in May 2025, enabling designers to generate functional prototypes from natural-language prompts. Designers report building usability-test-ready prototypes in roughly 20–30 minutes (per public Medium and Figma case examples), accelerating the prototype-test-iterate loop.

Read more

Nielsen Norman Group — iterative homepage redesign

NN/g, the firm Jakob Nielsen co-founded, has documented its own use of rapid prototyping and small-sample (5-user) usability tests on its homepage redesigns, applying the same “test with 5 users, iterate” heuristic the firm originated.

Read more

Got something to add? Share with the community.