Secondary Market Research

In Brief
Secondary market research is the desk-based work of producing two artifacts: a defensible TAM / SAM / SOM built from triangulated published sources, and a macro-trend map identifying the regulatory, demographic, technology, and channel shifts that materially change the sizing picture in the next 12–24 months. The output is a sizing-and-trend brief that supports a go/no-go decision on a market before any primary research. AI tools can assemble most of the raw material in hours, but they fabricate report titles, statistics, and citation URLs at rates between 18% and 94% depending on the task — so verification of every numeric and named claim, including re-fetching every cited URL, is part of the method, not an optional polish step.
Common Use Case
You have a market hypothesis but no customers yet. Before committing budget to interviews, smoke tests, or a build, you want a defensible answer to “is this market big enough, and is it moving in a direction that helps or hurts us?” — sourced from published material a skeptical investor or partner could check. You will close in a day on a directional read and in a week on a decision-grade brief.
Helps Answer
- How large is the market for our specific segment, geography, and use case — not the headline industry number?
- How does our bottom-up build-up compare to the top-down published TAM, and where are the assumptions most fragile?
- Which macro trends — regulatory, demographic, technology, channel-cost — would materially change the sizing picture in the next 12–24 months?
- What pricing benchmarks and willingness-to-pay signals exist in published material for this segment?
- What are realistic CAC, retention, and channel-cost benchmarks for this category?
- Which findings are strong enough to act on, and which need primary research before we commit budget?
Description
Secondary market research is a structured review of existing published material — government statistics, analyst reports, academic research, regulatory filings, trade-association data, public-company disclosures, founder benchmarks, and competitor data — used to estimate market size and map the trends that will reshape it. The goal is not to count headline numbers; it is to produce a defensible TAM / SAM / SOM and a trend map that together support a decision about whether to commit primary-research budget to a market.
The method has three stages. Prep frames the decision question, defines the segment slice you can actually reach, and identifies the candidate sources worth pulling — across free public data, library-accessible databases, paid analyst reports, regulatory filings, and founder benchmarks. Execution produces a structured record per source: the numbers, the assumptions behind them, the methodology used to derive them, and a per-claim citation. Analysis triangulates the numbers across sources, builds a bottom-up sizing that can stand against a top-down sanity check, maps the macro trends most likely to change the sizing picture, and turns it all into a decision-grade brief.
Two principles run through all three stages.
The first is triangulated and bottom-up by default. A single TAM number from a single analyst press release is a guess. A defensible TAM rests on at least two independent estimation paths — a top-down estimate from analyst or government data and a bottom-up build-up from named-segment counts × ACV — that agree to within roughly 3–5×. When the two disagree by more than that, the assumptions need work, not the number. A polished $50B TAM pulled from one report is consistently less credible than a $200M TAM with airtight bottom-up math, explicit segment definitions, and visible sensitivity to its assumptions. The same triangulation rule applies to macro trends: every trend claim should rest on at least two independent perspectives — analyst forecast plus government data, or trade press plus a public-company 10-K, not two restatements of the same primary source.
The second is citation-backed claims with mandatory fact-check. AI assembles secondary-research material faster than any human team, but current frontier models fabricate report titles, statistics, and citation URLs at rates between 18% and 94% depending on the task. The most pernicious failure mode is real URL plus fabricated claim — the citation looks defensible because the URL resolves, but the cited page does not actually contain the cited number. Every numeric claim and every named source must trace to a verifiable URL, and a fact-check pass — by a human or a verification subagent that re-fetches each URL and confirms the cited claim is on the cited page — is part of the method.
For query-level demand signal — what people are typing into Google and asking AI assistants right now — use Search Trend Analysis. This page is about how big a market is and where it is heading according to published material; that one is about what the audience is actively searching for. The two methods are complementary and often run in parallel during the same week.
How to
Prep
-
Frame the decision question, not the topic. “Is this market big enough?” is too vague to research. “How many US mid-market fintechs (50–500 employees) would pay $30K+/yr for compliance automation today, and how does the EU AI Act + state privacy laws shift that number by 2027?” is researchable and decision-anchored. Limit yourself to 2–3 such questions per research session — anything more and the work expands indefinitely.
-
Define the segment slice you can actually reach. Not “businesses” but “B2B SaaS companies with 10–50 employees in the US.” Not “consumers” but “homeowners aged 30–50 with household income above $100K.” Every later number — TAM, SAM, SOM, willingness-to-pay benchmarks, channel-cost estimates — is anchored to this slice. If you skip this step you produce a headline industry number, which is worse than no number because it feels finished.
-
Identify candidate sources across multiple buckets. A single source is a guess; multiple sources from different buckets is a triangulation. Cross-reference at least three of these source-pool buckets before locking your reading list:
- Government data — Census Business Builder, Bureau of Labor Statistics, SBA.gov, EU Eurostat, country-level statistics offices. Free, large samples, slow to update.
- Library-accessible analyst databases — Mintel, Passport / Euromonitor, Frost & Sullivan, IBISWorld, Statista. Most US public libraries and university libraries carry at least one; ask the reference desk before paying.
- Free analyst summaries — McKinsey Global Institute, Bain Insights, BCG Insights, HBR, World Economic Forum, OECD. Full reports are paywalled; the free-tier summaries often carry the headline number you need.
- Founder benchmarks — Lighter Capital SaaS benchmarks, OpenView SaaS benchmarks, ChartMogul, Stripe’s annual reports. Free, recent, founder-relevant.
- Regulatory filings — SEC 10-Ks for public competitors and adjacent players, FDA / EMA pipelines for regulated categories, EU AI Act compliance filings, state-level privacy filings. Underused; often the only place segment-level numbers are disclosed.
- Trade associations — IAB for ad tech, ICMA for capital markets, AHIP for health insurance, RIAA for music — often publish unit-economics and channel-cost data not available elsewhere.
- Trade press and beat reporting — Search Engine Land, The Information, Pitchbook News, sector-specific publications. Useful for recency, but always trace the cited primary source.
If a number shows up in only one bucket, treat it as directional. If it shows up in two or more independent buckets, it earns a slot in your sizing.
-
Sketch your sizing approach and trend dimensions before pulling data. Decide your TAM / SAM / SOM build-up plan in advance:
- Top-down — start from a published industry total, multiply by the share that matches your segment slice (geography × company size × use case).
- Bottom-up — start from a named count of segment members (Census, Crunchbase, LinkedIn) × ACV × time period.
- Benchmark-anchored — anchor an unknown number to a comparable category’s known benchmark (e.g., scheduling-software ACV ≃ project-management-software ACV at the same segment size).
Plan to run at least two paths and triangulate. For macro trends, pick the dimensions that matter to your specific segment from the PESTEL frame — Political, Economic, Social, Technological, Environmental, Legal — and drop the dimensions that do not. For most B2B software the load-bearing dimensions are Legal (compliance regimes), Technological (adoption curves of upstream platforms), and Economic (segment-level capex shifts); Environmental is often dead weight. Stub the trend map with the dimensions you will fill, before collecting data.
-
Tag claims as decision-grade or directional. Some claims drive decisions (TAM in the 1–5× range you are committing to, willingness-to-pay benchmarks for the segment you are pricing against, regulatory pipeline that affects your go-to-market). Others are background context (rough industry growth rate, generic technology-adoption color). Tag each candidate claim decision-grade (must be verified against a primary source and re-fetched by a fact-check pass) or directional (a roundup or AI summary is fine). Without this triage, fact-checking either takes forever or doesn’t happen.
Execution
The verified reading list from Prep is now the input to structured collection. The goal here is not to write essays on each source; it is to fill a fixed dual schema — sizing facts and trend facts — so the synthesis stage has clean inputs.
-
Use a fixed dual schema, not free-form notes. Every source produces two structured records:
Sizing record — segment definition (geography × company size or demographic × use case), headline number, methodology used by the source to derive it, sample or sample size, year of underlying data, who funded the report (if applicable), and any caveats the source itself flagged. Inline
[1],[2]citations on every numeric claim.Trend record — PESTEL dimension(s) the source addresses, the specific shift being claimed, the time horizon, the methodology behind the projection, the named affected segment, and any countervailing forces the source acknowledged. Inline citations on every projection.
A
Sourceslist at the end of each record resolves every marker to a working URL with the date the source was retrieved. -
Collect each field from its canonical source.
- Segment counts and demographics → Census, BLS, Crunchbase, LinkedIn, Statista demographic tables → bottom-up sizing.
- Top-down industry totals → analyst reports (Statista, IBISWorld, McKinsey/BCG/Bain summaries, library-accessible databases) → top-down sanity check.
- ACV, willingness-to-pay, retention, CAC benchmarks → founder benchmarks (Lighter Capital, OpenView), public-company 10-Ks for adjacent players, trade-association unit-economics reports → bottom-up multipliers.
- Regulatory pipeline → primary regulatory filings (SEC EDGAR for public companies, FDA / EMA pipelines, EU AI Act timeline documents, state-privacy law trackers like IAPP) → trend map (Legal).
- Technology-adoption curves and platform shifts → analyst forecasts (Gartner Hype Cycle, IDC, Forrester), public-company 10-Ks of upstream platforms, World Economic Forum and OECD reports → trend map (Technological).
- Demographic and economic trends → Census projections, BLS labor projections, IMF / OECD forecasts → trend map (Social, Economic).
-
Build top-down and bottom-up in parallel. Run both estimation paths from the start, not sequentially. The point is not to pick one — the point is to see whether they triangulate. If your bottom-up is $200M and your top-down is $50B, one of the two is reading the wrong segment and you need to find out which before you write anything.
-
AI does the first pass; humans verify decision-grade claims. A capable LLM with web access can fill 70–80% of the dual schema in hours. The remaining 20–30% — and any claim you tagged as decision-grade in Prep — must be confirmed by visiting the cited source yourself or routing through a fact-check subagent. Pricing pages change weekly; analyst-report headline numbers shift between editions; “X% growth” claims often hide a different segment definition than yours.
-
Flag everything gated as MANUAL. Anything behind a login, paywall, library-only access, or paid analyst report (full Pitchbook, full Forrester / IDC / Gartner, gated G2, full Mintel / Passport / Frost & Sullivan reports) gets flagged as MANUAL — [what would resolve it] in the record. AI cannot fetch behind authentication, and a confidently-stated number from a source the agent could not actually retrieve is the most common failure mode of AI-generated market sizing.
Analysis
The point of analysis is to produce a sizing-and-trend brief that supports a decision, not to fill schemas. Every artifact below should end in a sentence that names a choice you are now better equipped to make.
-
Triangulate sizing — top-down vs. bottom-up. Lay your top-down number, your bottom-up number, and any benchmark-anchored numbers side by side. If they agree to within roughly 3–5×, your sizing is defensible — pick the bottom-up as your primary number (it is more defensible to investors and partners) and use the top-down as the sanity-check ceiling. If they disagree by more than that, your assumptions are wrong somewhere and you cannot proceed without finding the gap. Most often the gap is segment definition: the top-down counts a wider universe than the bottom-up reaches.
-
Stress-test every load-bearing assumption. For each number you plan to act on, ask:
- Does the underlying data match the geography, segment, and time period I am sizing? (Specificity drift is the most common failure.)
- Is the data still recent enough to act on? (Anything older than 24 months in a fast-moving category is directional, not decision-grade.)
- Is the source’s definition of the category the same as mine? (Definition drift — “B2B SaaS” in one report includes services revenue, in another it doesn’t.)
- What conversion rate from “addressable” to “actually reached” am I using, and is it sourced or fabricated? (Whimsical conversion rates produce optimistic SOMs that fall apart on first contact with primary research.)
- If the source is vendor-funded, does the vendor benefit from a larger headline number? (Almost always yes.)
-
Map macro trends to the sizing picture. For each PESTEL dimension you collected, ask: does this trend materially change the SAM or SOM in the next 12–24 months, and in which direction? Tag each trend expanding (increases the addressable segment), contracting (shrinks it), or shifting (changes the segment definition rather than the size — for example, EU AI Act compliance becomes a buying criterion, raising willingness-to-pay among compliance-mature buyers and disqualifying compliance-immature ones). Drop trends that do not materially change the picture; a trend map of fifteen items is a list-making exercise.
-
Run a sensitivity analysis on the SOM. Pick the two or three load-bearing assumptions in your bottom-up build (segment count, conversion rate, ACV, retention), and compute the SOM at +30% / -30% on each. If the SOM swings more than 3× under reasonable assumption ranges, your number is fragile — say so in the brief, and identify the single primary-research move that would tighten the most fragile assumption.
-
Write the strategic insight. A 200–400 word brief answering five questions in order:
- What is the realistic SOM for our segment slice in the next 12 months? Bottom-up build, named segment, sourced ACV, sourced conversion rate.
- What is the top-down ceiling, and how far is the bottom-up from it? A short sentence on triangulation; if the gap is large, name where it lives.
- Which 2–3 macro trends materially change the SOM in 12–24 months, and in which direction? Each trend must rest on at least two independent sources.
- Which 2–3 assumptions are most fragile, and what primary research would tighten them? Names the next research move.
- Go / no-go / not-yet? A one-sentence recommendation grounded in the four answers above.
This is the artifact you actually act on. The schemas, citations, and sensitivity tables exist to make this paragraph defensible.
- Aspirational thinking disguised as analysis Founders pick TAM numbers that support the story they want to tell, then back-fit the methodology. Force yourself to commit to your sizing path and segment definition in Prep, before you have seen any of the headline numbers; if a single source flips your conclusion, your methodology was the conclusion.
- AI hallucination — fabricated citations Frontier LLMs fabricate report titles, statistics, and citation URLs at rates between 18% and 94% depending on the task. The pernicious failure mode is real URL plus fabricated claim: the citation looks defensible because the URL resolves, but the cited page does not contain the cited number. Every numeric and named claim must be primary-source-verified by re-fetching the URL and confirming the claim is on the page; treat unverified AI output as a draft, not a finding.
- Specificity drift Published numbers describe wide international or whole-industry categories; you will tackle a regional, segment-specific niche. Treat headline numbers as directional, not definitive, and re-anchor every estimate to the slice of the market you can actually reach. Most of the time the headline is bigger than your slice by an order of magnitude.
- Definition drift across sources “B2B SaaS,” “fintech,” “small business,” and “consumer health” mean different things in different reports. Two numbers that sound comparable often aren’t. Before you triangulate, normalize the segment definition each source is using and discard any that don’t match yours.
- Vendor-report bias Industry analysts and trend-report publishers benefit from a larger headline number; their commercial interest is often aligned with optimism. When you cite a vendor-funded report, weight it accordingly and require an independent corroboration before treating its number as decision-grade.
- Top-down-only fragility A TAM that rests only on a published top-down number is a guess no matter how prestigious the source. Always run the bottom-up build, even when it is rougher; the act of identifying segment counts and ACV exposes assumption gaps the top-down number hides.
- Confirmation bias in conversion rates The most common way founders inflate SOM is by importing a whimsical conversion rate from “addressable” to “actually reached.” Use sourced conversion rates from comparable categories (founder benchmarks are useful here) and run sensitivity at +30% / -30%; if the SOM only stands up at the optimistic end, say so in the brief.
- Willingness-to-pay vs. ability-to-pay Search volume, expressed interest, and “would you use this” survey results indicate willingness-to-use, not willingness-to-pay. Anchor pricing benchmarks to actual paid revenue from comparable products, not to interest signals.
- Recency neglect Regulatory pipelines, AI-platform adoption, and channel costs all shift fast enough that a 24-month-old report is often directional at best. Always check the underlying data year, not just the publication date.
Learn more
Case Studies
Spotify
Daniel Ek studied the collapse of the music industry from $14.6B (1999) to $6.7B (2015) using published revenue data and piracy statistics. The triangulation of macro-trend (piracy + smartphone adoption) against published revenue data was the analysis that justified the freemium streaming bet — secondary research producing a sizing-and-trend insight that pointed to a specific business-model decision.
Airbnb
The founders triangulated Craigslist rental listings, hotel-industry data, and conference attendance data to size the short-term lodging opportunity, identifying that hotel prices spiked predictably during major events. The macro-trend signal (event-driven price spikes that hotels couldn’t match) was visible in published data before any primary research, and it shaped the early supply-side strategy.
Uber TAM debate (Damodaran vs. Gurley, 2014)
NYU’s Aswath Damodaran sized Uber’s TAM at roughly $5B by anchoring to global taxi-and-limousine revenue; Bill Gurley argued the relevant TAM was closer to $300B by anchoring to total ground-transportation spend including private-car ownership. Both were grounded in published data; the debate is the canonical case study in how segment definition (taxis vs. ground transportation) drives the entire sizing answer. The lesson is that the bottom-up assumption you commit to before you see the numbers is the analysis.
Lighter Capital 2025 SaaS Benchmarks
A free founder-benchmark source reporting median 2025 B2B SaaS revenue growth of 28% (down from 47% in 2024), with the upper quartile at 65% (down from 88%). A free secondary source founders use to sanity-check their own growth-rate assumptions and ACV benchmarks against the market without paying for a private dataset — a worked example of the founder-benchmark bucket from Prep.
Further reading
- Pearson catalog
- Aswath Damodaran — A Disruptive Cab Ride to Riches: The Uber Payoff (2014)
- Bill Gurley — How to Miss By a Mile: An Analysis of Uber’s Potential Market Size (2014)
- SBA.gov: Market research and competitive analysis
- Cornell University Johnson Library FAQ: How can I find market research reports and data?
- Anthropic: How we built our multi-agent research system (2025)
- Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Models (arXiv 2508.03860, 2025)
- AI Hallucination Rate Benchmarks 2026
- HBR: The AI Tools That Are Transforming Market Research (2025)
- Bain & Company — Macro Trends Research
- PESTLE Analysis (SI Labs)
- Census.gov — Economic Census
Got something to add? Share with the community.