Infrastructure

The Experiment Library: 10 Tests That Prove a Startup Idea Works

Ten startup validation experiments, ranked by how strong the evidence is, how much they cost, and how long they take. The toolkit we run on every idea.

Phat Nguyen

Content Engineer

Phat Nguyen

The world has a lot of experiment types. Strategyzer catalogues forty-four. Academic researchers list more. A motivated founder with a spreadsheet can invent another ten over a weekend.

At OS Research, we work with ten. Lean by design. A short list forces the team to pick the right test, not the most clever one. The goal isn't to run every experiment, it's to get to evidence as fast and as cheaply as possible.

Four dimensions

Every experiment sits somewhere on each of these axes.

  • Purpose. Discovery (what does the customer feel, want, look like) vs. validation (does this specific Pitch claim hold).

  • Evidence strength. Opinion in a survey is weak; money pre-paid for a non-existent product is very strong. Most cycle craft is stacking experiments so later ones are stronger.

  • Cost. Essentially free to genuinely expensive. Cheap = more, but cheap is usually weak.

  • Time. Hours to weeks, with a hard ceiling. If it needs months, it's not an experiment, it's a build project.

The ten

1. Survey

Simplest tool, most often misused. Five minutes or less, sent to the target segment. Works for both discovery and validation. Cheap, fast, easy. Weakness is fundamental: surveys measure what people say, not what they do. Signal, never alone.

2. Light prototypes

Small bundle of lightweight artifacts. Pitch deck. One-page brochure. Data sheet. Two-minute explainer video. AI-generated mockups. Point isn't to impress, it's to give the team something to hold up in an interview, so the conversation moves from abstract to specific. Reactions, not commitment.

3. Interviews

Done well, the most information-dense experiment. An hour with a real customer, asked the right questions in the right order, by someone listening rather than selling, beats a hundred surveys.

Done badly, worse than useless. Leading questions produce false positives. We train teams to listen for behavior, not opinion, past actions, not hypothetical futures. What did you do last time, not what would you do if.

4. Simple landing page

One page describing the offer with a clear call to action. Three enhancements: A/B testing the messaging. Link tracking by channel. Feature stub: a "coming soon" button that pretends the feature exists, letting us measure demand before we build (sometimes called a fake door). We always follow up with people who clicked, tell them honestly where the product is, often convert them into interview subjects.

5. Clickable prototype

Non-functional but navigable mockup. Answers questions about experience and flow without committing engineering time.

6. Online ad on Meta

Useful precisely because budget forces the ad to compete in a real market for real attention. Real cost per click, real conversion rate, real cost per signup. Closer to launch reality than any organic signal, especially strong as a fake-door test.

7. Social or email mini-campaign

Organic posts or short email sequences. Cheaper than paid, noisier as signal. Best for messaging tests: three framings, see which gets engagement, shares, replies. Capped deliberately so it doesn't become a content project.

8. Hi-fi prototype

Something that actually works. Two shapes:

  • Single-feature MVP: product stripped to its core moment, built well enough for a real user end-to-end.

  • Mash-up: Frankenstein of Typeform + Zapier + Notion + a payment form, stitched so a customer can get value.

Stronger evidence than anything before. The user is interacting with something real.

9. Simulation

Pretending to be the product. Two flavors:

  • Concierge. Deliver value manually, one customer at a time, no automation.

  • Wizard of Oz. Front end looks real, a human handles the back end.

Both validate the value proposition before investing in scalable delivery.

10. Payment call-to-action

Strongest test of intent. Two variants: mock sale (real checkout, product not yet, refund with explanation) and pre-sale (real money for a product to be delivered later).

Pre-sales are the gold standard. People do not lie with their wallets.

Hierarchy of evidence

Weakest to strongest:

Surveys & prototype reactions → Interviews → Clickable prototypes & landing-page click-through → Meta ad conversions & hi-fi prototype usage → Simulations, mock sales, pre-sales.

Most projects don't run all ten. A good cycle runs three or four, sequenced so the weakest is tested against the strongest. If a hypothesis survives a survey and an interview but fails a pre-sale, the pre-sale is the answer.

Strategyzer 44 Experiments Library

From library to sequence

Different businesses benefit from different orderings. Hardware doesn't validate the same way as B2B SaaS. Marketplaces aren't service businesses. Picking the right experiment is half the craft. Picking the right sequence is the other.

← Previous: The Six-Week Testing Cycle
→ Next: How to Sequence Startup Experiments by Business Type
🏠 Series hub: Inside a Startup Validation Studio