I shipped 72 PRs to a SaaS in 24 hours with Claude Code (and what broke)
An honest log of what happens when you give an AI agent full autonomy on a real product for a day. Real numbers: 74 PRs merged, 1846 tests, 0 paying customers, one production-blocking incident.
Yesterday I told Claude Code: "build out QualityPilot for the next 10 hours, full autonomy, don't ask for confirmation."
This is the honest log of what happened.
The numbers
- 74 PRs merged to the QualityPilot monorepo
- 412 → 1846 tests in the suite
- 31 Supabase migrations applied to production
- 2 npm publishes (
@qlens/cliand@qlens/cypress-reporter) - 0 paying customers at the end of the day (still 0 — important context, see the last section)
- 1 production-blocking incident that I caught + fixed mid-flight
What got built — top 6
The core product (failed test → AI proposes a fix → opens a GitHub PR) was already there. Of what was added in those 24 hours, the most consequential pieces:
- Multi-fix consolidation — when N tests fail in the same file, one PR fixes all N instead of N separate PRs racing each other. Removes the most common reviewer complaint about AI fix tools.
- Public AI playground at
qlens.dev/playground— paste a failing test, get a realgpt-4o-minifix in seconds. No signup, 5 free attempts/day per IP. - 14-day Pro trial without credit card —
resolveEffectivePlan()treats trial users as Pro across 11 plan-cap-reading endpoints, with no Stripe handoff until the user explicitly adds a card. - Quality Time Machine ROI dashboard at
/dashboard/roi— hours saved, $ saved at configurable hourly rate, payback period, weekly trend, top flaky files. Plus a PDF export so users can paste it into a deck. - Stripe usage-based metering for Enterprise — $0.50/fix billed via the V2 Meter Events API, no monthly minimum.
@qlens/cli—npm install -g @qlens/clithenqlens scanfrom any repo, orqlens auto-fix file.test.tsto send a single failing test to the playground from the terminal.
Plus ~50 smaller things — Slack/Email/Discord/MS Teams/Linear/Jira/PagerDuty/Sentry/Notion outbound integrations, Cypress reporter (4th framework), public scan results page with shareable badge SVG, /vs/{competitor} SEO microsites, /docs hub, public /status page, cohort emails, webhook secret rotation reminders, AI prompt customization per repo, side-by-side diff view, bulk merge UI, Bug Detective Insights page. The full list is in the changelog — that's exactly the wrong place to read it as a stranger.
The incident
About 4 hours in, prod deploys silently stopped landing. New routes (like /playground) returned 404 on www.qlens.dev even though the PRs that introduced them were merged on main.
My first guess was Vercel's deploy quota. We'd done ~50 PRs by then, ~2 deploys per PR including previews, easily over Vercel Hobby's 100/day cap. The error message I'd seen earlier in the day did say:
Resource is limited - try again in 24 hours
(more than 100, code: "api-deployments-free-per-day")
That fit the timing perfectly.
It was wrong.
A real Vercel cap incident DOES exist on Hobby — it's a rolling 24h window, you can hit it with high-velocity merging. But the cap is transient: deploys queue, then drain. What was happening here was permanent: every deploy attempt failed identically.
I ran the obvious next command:
vercel inspect <deployment-url> --logs
2026-04-19T06:29:55.872Z ⨯ useSearchParams() should be wrapped in
a suspense boundary at page "/playground".
2026-04-19T06:29:55.873Z Read more: https://nextjs.org/docs/messages/missing-suspense-with-csr-bailout
2026-04-19T06:29:55.883Z Error occurred prerendering page "/playground".
2026-04-19T06:29:55.951Z Error: Command "npm run build" exited with 1
Next.js 16's prerender step bails out on unbounded useSearchParams() calls. The playground client component used the hook to read ?example=... deep links. PR review caught nothing — the agent ran vitest, tsc, eslint clean, all green. Vitest doesn't run next build. The hook works at runtime; it just refuses to be statically prerendered.
So every subsequent merge — Slack notifications, the Cypress reporter, the docs hub, all of it — got blocked behind the same broken build. None of those PRs deployed. The cap-rate-limit error I'd seen earlier was unrelated transient noise.
The fix was 14 lines:
// src/app/playground/page.tsx
import { Suspense } from "react";
// ...
<Suspense fallback={<LoadingSkeleton />}>
<PlaygroundClient />
</Suspense>
The prevention was a separate PR adding npx next build to the PR-checks GitHub Action with stub env vars, so this exact failure mode catches at PR-time, not deploy-time. Cost: ~30s per PR. Value: never debug "why is my deploy queue silent" again.
Two lessons logged to memory:
- When deploys silently stop landing, run
vercel inspect <url> --logsBEFORE assuming the infra layer is at fault. Symptom-collision between rate-limit and build-failure made me lose ~30 minutes. - CI that runs unit tests but not
next buildwill let prerender bugs through. If you're using Next.js App Router withuseSearchParams/usePathname/useRouter, this lands in the same trap. Addnext buildto PR checks.
What I had to do as the operator
The Claude Code agent has tools for almost everything, but a few things still needed me:
- Resend account creation — needs email verification with a real human; no API
- Cloudflare API token — created scoped to one zone via the browser flow, then handed off (the agent did the rest of the DNS work programmatically: SPF + DKIM + MX records all pushed via CF API)
- First npm publish OTP for each new package — though after the first publish, the cached granular access token did subsequent publishes without prompting
- GitHub App registration — manifest flow needs a browser click; deferred for now, the scaffold sits in the repo
- Vercel Pro upgrade when we hit the deploy quota for real — $20/mo, one-time bet for the duration of the Claude Max sprint window
Total operator time: about 6 minutes spread over the day. Not zero, but not a 10-hour pair-programming session either.
What about quality?
I won't claim "every PR was perfect" — the playground bug shipped past my review. What I can claim is:
- The vitest suite went from 412 → 1846 tests in main (publicly verifiable via GitHub Actions logs)
- Zero unmerged hotfixes for product regressions — the playground build error was the one mid-flight catch and it merged 8 minutes after detection
- Each agent's PR description documented at least one trade-off they made differently than the spec asked for (the most useful kind of artifact when reviewing AI-generated PRs)
A few moments where agents pushed back on my prompt and were right:
- I asked for a CLI command
qlens statusthat hits/api/v1/usage. Agent caught that endpoint doesn't exist and used/api/v1/auto-fix-metricsinstead. - I asked for Stripe
subscriptions.createUsageRecordfor the metered tier. Agent caught that the API was deprecated as of2025-03-31(basilAPI version) and used the V2 Meter Events API instead. - I asked for Enterprise to be a Marketplace tier. The Marketplace agent argued Enterprise should be off Marketplace (GitHub takes 25% on transactions, Stripe takes ~3%) — I agreed.
Each "actually, here's why your spec is wrong" was a small win.
The real story: 0 customers
74 PRs. 1,846 tests. Every feature you'd want in a v1 dev tool. And zero paying customers at the end of the day.
Because we're in stealth.
The point of this post: shipping fast is solved. Shipping fast doesn't matter if no one knows you exist. The hard part isn't the code. The hard part is showing up.
Some of the readiness was overshoot. We have a 6-step customer-success email cadence and we have zero customers. We have webhook secret rotation reminders firing every 90 days and we have zero secrets in production beyond my own. We have a ROI dashboard that calculates time-saved-times-hourly-rate from a row count of zero. AI-assisted velocity is great but it lets you build a maze before you've checked whether anyone's looking for the entrance.
So this blog exists. We'll write about how this thing got built, what we're learning when real users start touching it, and what the actual conversion data looks like once it starts flowing. No stunts. Real numbers — including the ones that tell us we got something wrong.
If you're building a dev tool yourself, try the playground — paste a failing test, see the AI fix it in 5 seconds. No signup. 5 free attempts per day per IP.
If your test suite is rough, scan a public repo by hitting qlens.dev/scan/{owner}/{repo} for free.
If you've shipped something with Claude Code at this kind of pace, I'd love to compare notes — reply by email to noreply@qlens.dev and Resend will route it to me.
— Ihor
About the author
Ihor Kosheliev builds QualityPilot — an AI Bug Detective that turns failed CI runs into proposed fixes as GitHub PRs. See how it works.