Verification Is Free Now: The Real Phase Change of the AI Era
How YC CEO Garry Tan's Complexity Ratchet reveals that 90% test coverage went from heroic effort to Tuesday
One sentence version: AI's biggest change isn't writing code for you — it's that checking your code became free. Like a seatbelt that locks when you pull it, once every coding session automatically generates tests, software quality can only go up, never back down. That's the ratchet effect.
You Can't Even Read Your Own Code Three Months Later
Anyone who's written code knows the feeling: you open a file you wrote three months ago and have no idea why you made that particular decision. The worse version — you change the color of a login button and somehow the shopping cart breaks, and nobody can tell you why.
This isn't a skill problem. It's a structural flaw in human memory. You can't remember. Your teammates can't remember. And whoever inherits the project after you leave will have even less to go on.
Last week, YC CEO Garry Tan published a long essay drawing from his own two open-source projects — nearly a million lines of code combined — and laid out a solution. His argument: this problem can't be fixed by "writing better docs" or "being more careful." The fix is having AI write tests for you, permanently encoding every "why did we do this" decision directly into the codebase.
"It's not that AI lets you write code faster. It's that AI lets you verify at a level that was previously too expensive to sustain." — Garry Tan, "The AI Agent Complexity Ratchet"
Notice what he's saying. Not "AI helps you write code faster." He's saying "AI lets you do something that used to be too expensive to sustain: thoroughly checking your code."
The Seatbelt Principle: Why It's Called a Ratchet
A ratchet is a mechanical device that can only turn in one direction — a socket wrench drives a bolt forward, never back. A seatbelt is the most familiar ratchet in daily life: once it locks, it won't let go.
Garry says software quality can become a seatbelt too. The method: every time AI helps you write code, have it do three things simultaneously:
- 1Write tests — not "did the function return the right value," but "how should the system behave in this scenario." Like signing a contract for every behavior — violations trigger an alarm automatically.
- 2Write documentation — not "what does this code do," but "why was it done this way." Open the file three months later and you don't have to guess.
- 3Record a score — after each run, grade the output and keep the record. The next version has to match or beat that score, never do worse.
The next time AI comes to modify the code, it automatically reads all the tests, documentation, and scores from before. It can't cut corners, because the tests will turn red. It can't ignore the docs, because the docs are right there in context. It can't lower quality, because the historical scores are sitting there.
The quality floor can only go up. The seatbelt has locked. It won't let go.
Garry validated this with his own projects: 14 code commits merged in 72 hours, nearly 30,000 lines of new code, each release with more tests and better quality than the last. Speed and quality at the same time — but only because the ratchet was in place.
Why Nobody Did This Before AI
The answer is simple: it was too expensive.
Writing tests is tedious, repetitive work that demands enormous patience. Picture this: your team ships a feature with 70% test coverage. The remaining 30% are edge cases — some user submitting a form with emoji in their name using IE at 3am on a Tuesday. Technically it should be tested. But writing that test takes two hours, and it's Friday at 5pm and the developer is exhausted.
It's not that they don't want to write it. They're just tired. Every software team stops at this exact point and calls it "good enough."
Research confirms this intuition: analysis across more than 10,000 software projects shows that going from 70% to 90% test coverage drops the probability of bugs reaching users from roughly 30% down to 3%. Out of 100 bugs, you used to let 35 slip through. Now only 3 get out. That's not a linear improvement — it's a cliff edge.
Aviation figured this out 50 years ago. FAA standards for flight-critical software require defect capture rates above 99%. They didn't do this because they love process — they did it because on an airplane, the bugs you miss kill people. The data forced them past 90%.
But here's the thing: aviation has money. They can fund a team that spends months doing nothing but writing tests. Regular software teams don't have that budget. So the entire industry has been stuck at 70-80% for fifty years.
Then AI arrived.
AI doesn't want to go home at 5pm on Friday. Writing the fourteenth edge-case test feels exactly the same as writing the first. That "brutal last 20%" — the part that made 90% coverage economically impossible for human teams — is precisely the kind of work AI handles best, most happily, with zero sense of "let's just call it done."
Checking your homework used to cost money — you had to pay someone to do it. Now AI checks it for free. That's the real phase change.
Dave Left, But the Tests Are Still Here
Every software company has seen this: you open a critical file and find a comment that says // DO NOT CHANGE THIS -- ask Dave.
Dave left three years ago.
In traditional companies, the knowledge of "why is this code written this way" lives inside people's heads. Dave knows why you can't touch that caching layer. Sarah remembers the migration that nearly destroyed the database. Marcus can explain the weird rounding logic in the billing system. But people leave — they retire, they get poached, they burn out. The knowledge goes with them.
Tests don't leave.
When a test encodes "rounding must use 0.05 increments" and the documentation explains "because higher precision causes confidence scores to become unreliable in practice" — that knowledge is carved into the codebase, permanently. No matter who maintains the project, no matter which AI model gets swapped in, it reads that constraint and respects it.
Tests are the institutional memory that outlives employee turnover. For a solo project, this matters even more — because your tests are your only teammate.
A Real Story: Who Said What?
Garry walked through a concrete failure and fix using his GBrain project. GBrain is a knowledge system that extracts "who believes what" from large volumes of text. The first run pulled out over 100,000 claims.
The problem was a bug called "holder confusion." When the system encountered a claim like "AI will replace 80% of software engineers by 2027," it got confused 35% of the time: was this said by the article's author? Someone the author was quoting? An inference the system drew from a podcast transcript?
If you're building a system that tracks "who believes what," getting "who" wrong means your core capability is broken.
The fix was a textbook ratchet:
- 1Two different AI models (GPT-5.5 and Claude) independently scored the outputs and identified six specific failure modes
- 2Each of the six failure modes was addressed, with precision rules enforced at the database layer
- 317 tests locked in the fixes. No future version can ship without passing all of them
Nobody ever has to remember what "holder confusion" is or why rounding precision matters. The tests remember. The quality floor permanently moved up one notch, and it won't come back down.
Not Just Code — Anything Observable Can Be Locked
There's an easy-to-miss point here: the ratchet doesn't only apply to "did this function return the right value." Anything a computer can see can be ratcheted.
Garry gave a striking example. His GStack tool has a feature called "architecture review" — you ask AI to review your technical plan, and it should go section by section asking questions, poking holes, challenging your assumptions, like an engineering manager who actually read your code.
The problem: AI sometimes gets lazy. It dumps all its findings at once and exits, never once actually engaging with you. But the entire point of a review is the back-and-forth.
How do you test "did the AI have a proper conversation with a human"? Traditional unit tests can't touch this.
Garry's approach: build a virtual terminal, put the AI inside it, give it a scenario, then literally watch it work — observe whether it asks you at least one question before finishing. Didn't ask? Test fails.
Imagine you hired a consultant to review your business plan. If they skim it for ten minutes and say "looks fine, no issues" then walk out, you'd feel cheated. A real review has back-and-forth, follow-up questions, pushback. AI behavior can be held to the same standard — if you can observe it, you can assert on it. If you can assert on it, you can lock it down.
Three Layers of Seatbelt
Garry's essay focuses on the code layer. But think about it: this "only moves forward" mechanism applies to far more than just code.
Your software has a bug, AI helps you write a test to lock in the fix.
Analogy: answer key written in the answer book — no need to re-derive it next time.
AI skipped a question it should have asked, so you add a gate: no passing without asking.
Analogy: preflight checklist — pilots can't skip steps from memory.
The tuition you paid turns into a decision principle that fires automatically next time.
Analogy: a hand that touched a hot stove — no need to relearn the lesson each time.
All three layers share five properties that make the ratchet work:
- 1The cost of encoding is near zero — AI writes tests and documentation at almost zero marginal cost
- 2What's encoded doesn't disappear — tests sitting on disk don't forget, don't quit, don't burn out
- 3Loading is automatic — AI reads all historical tests and documentation every time it starts work, no human reminder needed
- 4Verification is deterministic — running a test suite is a mechanical operation that doesn't depend on human memory or judgment
- 5Erasing costs more than keeping — deleting a test means abandoning a verified safety guarantee. Nobody does that voluntarily
Point 5 is the core insight: staying put is cheaper than going backward, so the system can only move forward. That's why it's called a ratchet — its power doesn't come from any single correct decision. It comes from the structural property of irreversibility itself.
What Happens to the People Who Skip It
Garry observed a pattern across YC and the open-source community: projects that use AI to write code but skip the tests almost all start falling apart at medium complexity. A few thousand lines, a handful of features tangled together, and every change risks breaking something else unexpectedly.
Think about building with blocks: the first ten pieces stack fine no matter how you place them. By piece fifty, every new block makes the tower wobble. By piece one hundred, you're afraid to touch anything.
The AI in these projects is fine. The code is fine. What's missing is the seatbelt. With no tests to tell you "that block you just placed loosened block 37," you can only find out when everything collapses.
But It's Not Perfect
To be fair, the ratchet has limits:
- It guarantees the floor rises, not perfection. Garry's own knowledge system has an extraction accuracy of 68% — lots of room to improve. But crucially: it won't go back to 35%.
- Some things can't be tested. Integration points, infrastructure plumbing, genuine edge cases. The ratchet covers most behaviors, not all of them. But "most" is enough to sustain fast forward progress.
- The tests themselves might be wrong. The ratchet assumes you've locked in the correct behavior. This is recursive — who tests the tests? Having two different AI models score independently is one partial answer.
The One Thing You Can Do
Whether or not you're a programmer, Garry's core insight applies to anyone doing complex work:
Every time you pay tuition — mess something up, discover a trap, figure out a "why" — don't just fix it and move on. Turn it into a rule, a checklist item, a written record, so it can never be forgotten again.
That's the ratchet: your experience can only accumulate upward. It won't regress because of memory decay, team turnover, or plain forgetting.
Making humans do this used to be too expensive. Now AI does it for free.
That's the real phase change of the AI era — not moving faster, but that mistakes you've already paid for can never be repeated.
