Infrastructure
The Six-Week Testing Cycle: How to Validate (or Kill) a Startup Idea
Six weeks is long enough to validate a startup idea, short enough to kill one. Inside the testing cycle OS Research uses on every project.
Six weeks is a long time to work on something that might not be real. It's also a short time to prove that it is.
That tension is what the Six-Week Cycle is designed to hold. Long enough to run three design sprints and three testing weeks. Short enough that nobody can hide behind planning, polish a prototype into art, or convince themselves an idea is working when it isn't.
The core commitment
Every cycle runs on a single principle, borrowed from Shape Up: fixed time, variable scope.
Six weeks doesn't move. If the team runs out of time, the team cuts what it's testing, not what it's trying to prove. Calendar wins.
This matters. Most teams quietly let deadlines slip. Research isn't finished. Prototype needs another polish. Six weeks turns into nine, then twelve, and the idea evaporates without a verdict because nobody wants to be the one to say it died.
At OS Research, the cycle ends when the calendar says so. Stopping is a valid result. An idea that can't move from "not really confident" to "somewhat confident" in six weeks is telling us something.
The goal of one cycle
Move the Pitch from one confidence level to the next. Not build a product. Not plan a launch.
Cycle 1: not really confident → somewhat confident
Cycle 2: somewhat confident → very confident
Two cycles is the hard ceiling. If an idea can't pass the bar in twelve weeks of focused work, it goes back into the queue or gets archived.
Founders push back hardest here. Their idea is different, needs more time, is in a complicated market. We listen, then hold the ceiling. Ideas that can't meet the twelve-week bar almost never become good businesses later. They become projects that quietly consume the team while the portfolio suffers.
Three parts
Same internal shape every cycle.
Part 1: Experimenting
Active half. Team designs experiments against the Critical Assumptions, builds the artifacts each requires, runs them like scientists.
Held together by the Test Card. One per experiment, filled in before it runs. States the hypothesis, the design, the single metric, and the success criterion. Success criterion is non-negotiable: committed before data comes in, so we can't slide goalposts after.

One page each. Shortness is deliberate. If it can't fit on a page, the experiment is too complicated, which means the hypothesis is too fuzzy.
Part 2: Learning
Where most teams collapse. Evidence comes in. Everyone wants to read it as confirming the idea. Fighting that instinct is the work.
Evidence rubric:
Behavior beats opinion. Real world beats lab. Paid commitment beats free.
Always.
Once ranked, write Learning Cards. One per experiment. Insight + confidence level + actionable update to the Pitch. Evidence and insight written separately on purpose: it makes it harder to lie about what the data actually supports.

Write them while data is fresh, within a day or two. A week later the subtleties are gone.
Part 3: Action
Shortest, most important. End of six weeks, the team decides:
Continue. Another cycle on the riskiest remaining assumptions.
Pivot. Substantial Pitch reshape based on evidence the original framing was wrong.
Kill. Evidence shows it doesn't work.
Pitch version increases. Old version archived (not deleted). What we got wrong is real knowledge for the studio.
Cycle one, week by week
Three design sprints alternating with three testing weeks.
Week 1 (design). Lightweight artifacts: pitch deck or mock landing page, one-page brochure, data sheet, explainer video. Plus a 5-minute Typeform survey and first batch of problem interviews scheduled.
Week 2 (test). Get out of the building. Artifacts in front of real customers. Watch reactions to framing, language, problem.
Week 3 (design). Lo-fi prototypes. Simple landing page. Capped social mini-campaign. Email mini-campaign. Meta ad pointed at a fake door.
Week 4 (test). Lo-fi prototypes live. Engagement, click-throughs, inbound questions. Compare what people do vs. what they said in week 2 — the gaps are the learning.
Week 5 (design). Hi-fi prototypes. Single-feature MVP, mash-up using existing tools, concierge version, Wizard of Oz, mock sale or pre-sale.
Week 6 (test). Strongest evidence of the cycle. Focus on purchase intent and actual behavior. Do they pre-pay? Sign up? Show up?
After the cycle
Stack of Test Cards, stack of Learning Cards, v2 Pitch waiting.
Strong evidence → cycle two. Cycle one is mostly about the VPC (do people want this?). Cycle two shifts to the BMC (can we build a business around it?).
Thin or negative → kill, with a write-up. Killing isn't failure. It's the part of the process that justifies the speed of everything else.

Operating principles
The six-week timer is sacred. Cut scope, never push the deadline.
Clean evidence beats a lot of evidence. Ten well-run interviews beat a hundred sloppy ones.
Learning Cards get written while data is fresh.
Killing is a valid outcome.
Run cycles this way, across enough ideas, and the studio builds something rare: a portfolio where every survivor has been pressure-tested against reality, and every kill leaves clean evidence for the next team.
← Previous: Forget the Pitch Deck
→ Next: The Experiment Library
🏠 Series hub: Inside a Startup Validation Studio
