QualityPilot
7 min readBy Ihor Kosheliev

I shipped 72 PRs to a SaaS in 24 hours with Claude Code (and what broke)

An honest log of what happens when you give an AI agent full autonomy on a real product for a day. Real numbers: 74 PRs merged, 1846 tests, 0 paying customers, one production-blocking incident.

build-in-publicclaude-codeai-agentssaas

Yesterday I told Claude Code: "build out QualityPilot for the next 10 hours, full autonomy, don't ask for confirmation."

This is the honest log of what happened.

The numbers

  • 74 PRs merged to the QualityPilot monorepo
  • 412 → 1846 tests in the suite
  • 31 Supabase migrations applied to production
  • 2 npm publishes (@qlens/cli and @qlens/cypress-reporter)
  • 0 paying customers at the end of the day (still 0 — important context, see the last section)
  • 1 production-blocking incident that I caught + fixed mid-flight

What got built — top 6

The core product (failed test → AI proposes a fix → opens a GitHub PR) was already there. Of what was added in those 24 hours, the most consequential pieces:

  1. Multi-fix consolidation — when N tests fail in the same file, one PR fixes all N instead of N separate PRs racing each other. Removes the most common reviewer complaint about AI fix tools.
  2. Public AI playground at qlens.dev/playground — paste a failing test, get a real gpt-4o-mini fix in seconds. No signup, 5 free attempts/day per IP.
  3. 14-day Pro trial without credit cardresolveEffectivePlan() treats trial users as Pro across 11 plan-cap-reading endpoints, with no Stripe handoff until the user explicitly adds a card.
  4. Quality Time Machine ROI dashboard at /dashboard/roi — hours saved, $ saved at configurable hourly rate, payback period, weekly trend, top flaky files. Plus a PDF export so users can paste it into a deck.
  5. Stripe usage-based metering for Enterprise — $0.50/fix billed via the V2 Meter Events API, no monthly minimum.
  6. @qlens/clinpm install -g @qlens/cli then qlens scan from any repo, or qlens auto-fix file.test.ts to send a single failing test to the playground from the terminal.

Plus ~50 smaller things — Slack/Email/Discord/MS Teams/Linear/Jira/PagerDuty/Sentry/Notion outbound integrations, Cypress reporter (4th framework), public scan results page with shareable badge SVG, /vs/{competitor} SEO microsites, /docs hub, public /status page, cohort emails, webhook secret rotation reminders, AI prompt customization per repo, side-by-side diff view, bulk merge UI, Bug Detective Insights page. The full list is in the changelog — that's exactly the wrong place to read it as a stranger.

The incident

About 4 hours in, prod deploys silently stopped landing. New routes (like /playground) returned 404 on www.qlens.dev even though the PRs that introduced them were merged on main.

My first guess was Vercel's deploy quota. We'd done ~50 PRs by then, ~2 deploys per PR including previews, easily over Vercel Hobby's 100/day cap. The error message I'd seen earlier in the day did say:

Resource is limited - try again in 24 hours
(more than 100, code: "api-deployments-free-per-day")

That fit the timing perfectly.

It was wrong.

A real Vercel cap incident DOES exist on Hobby — it's a rolling 24h window, you can hit it with high-velocity merging. But the cap is transient: deploys queue, then drain. What was happening here was permanent: every deploy attempt failed identically.

I ran the obvious next command:

vercel inspect <deployment-url> --logs
2026-04-19T06:29:55.872Z  ⨯ useSearchParams() should be wrapped in
                            a suspense boundary at page "/playground".
2026-04-19T06:29:55.873Z    Read more: https://nextjs.org/docs/messages/missing-suspense-with-csr-bailout
2026-04-19T06:29:55.883Z  Error occurred prerendering page "/playground".
2026-04-19T06:29:55.951Z  Error: Command "npm run build" exited with 1

Next.js 16's prerender step bails out on unbounded useSearchParams() calls. The playground client component used the hook to read ?example=... deep links. PR review caught nothing — the agent ran vitest, tsc, eslint clean, all green. Vitest doesn't run next build. The hook works at runtime; it just refuses to be statically prerendered.

So every subsequent merge — Slack notifications, the Cypress reporter, the docs hub, all of it — got blocked behind the same broken build. None of those PRs deployed. The cap-rate-limit error I'd seen earlier was unrelated transient noise.

The fix was 14 lines:

// src/app/playground/page.tsx
import { Suspense } from "react";

// ...
<Suspense fallback={<LoadingSkeleton />}>
  <PlaygroundClient />
</Suspense>

The prevention was a separate PR adding npx next build to the PR-checks GitHub Action with stub env vars, so this exact failure mode catches at PR-time, not deploy-time. Cost: ~30s per PR. Value: never debug "why is my deploy queue silent" again.

Two lessons logged to memory:

  1. When deploys silently stop landing, run vercel inspect <url> --logs BEFORE assuming the infra layer is at fault. Symptom-collision between rate-limit and build-failure made me lose ~30 minutes.
  2. CI that runs unit tests but not next build will let prerender bugs through. If you're using Next.js App Router with useSearchParams / usePathname / useRouter, this lands in the same trap. Add next build to PR checks.

What I had to do as the operator

The Claude Code agent has tools for almost everything, but a few things still needed me:

  • Resend account creation — needs email verification with a real human; no API
  • Cloudflare API token — created scoped to one zone via the browser flow, then handed off (the agent did the rest of the DNS work programmatically: SPF + DKIM + MX records all pushed via CF API)
  • First npm publish OTP for each new package — though after the first publish, the cached granular access token did subsequent publishes without prompting
  • GitHub App registration — manifest flow needs a browser click; deferred for now, the scaffold sits in the repo
  • Vercel Pro upgrade when we hit the deploy quota for real — $20/mo, one-time bet for the duration of the Claude Max sprint window

Total operator time: about 6 minutes spread over the day. Not zero, but not a 10-hour pair-programming session either.

What about quality?

I won't claim "every PR was perfect" — the playground bug shipped past my review. What I can claim is:

  • The vitest suite went from 412 → 1846 tests in main (publicly verifiable via GitHub Actions logs)
  • Zero unmerged hotfixes for product regressions — the playground build error was the one mid-flight catch and it merged 8 minutes after detection
  • Each agent's PR description documented at least one trade-off they made differently than the spec asked for (the most useful kind of artifact when reviewing AI-generated PRs)

A few moments where agents pushed back on my prompt and were right:

  • I asked for a CLI command qlens status that hits /api/v1/usage. Agent caught that endpoint doesn't exist and used /api/v1/auto-fix-metrics instead.
  • I asked for Stripe subscriptions.createUsageRecord for the metered tier. Agent caught that the API was deprecated as of 2025-03-31 (basil API version) and used the V2 Meter Events API instead.
  • I asked for Enterprise to be a Marketplace tier. The Marketplace agent argued Enterprise should be off Marketplace (GitHub takes 25% on transactions, Stripe takes ~3%) — I agreed.

Each "actually, here's why your spec is wrong" was a small win.

The real story: 0 customers

74 PRs. 1,846 tests. Every feature you'd want in a v1 dev tool. And zero paying customers at the end of the day.

Because we're in stealth.

The point of this post: shipping fast is solved. Shipping fast doesn't matter if no one knows you exist. The hard part isn't the code. The hard part is showing up.

Some of the readiness was overshoot. We have a 6-step customer-success email cadence and we have zero customers. We have webhook secret rotation reminders firing every 90 days and we have zero secrets in production beyond my own. We have a ROI dashboard that calculates time-saved-times-hourly-rate from a row count of zero. AI-assisted velocity is great but it lets you build a maze before you've checked whether anyone's looking for the entrance.

So this blog exists. We'll write about how this thing got built, what we're learning when real users start touching it, and what the actual conversion data looks like once it starts flowing. No stunts. Real numbers — including the ones that tell us we got something wrong.

If you're building a dev tool yourself, try the playground — paste a failing test, see the AI fix it in 5 seconds. No signup. 5 free attempts per day per IP.

If your test suite is rough, scan a public repo by hitting qlens.dev/scan/{owner}/{repo} for free.

If you've shipped something with Claude Code at this kind of pace, I'd love to compare notes — reply by email to noreply@qlens.dev and Resend will route it to me.

— Ihor


ShareTwitterLinkedIn

About the author

Ihor Kosheliev builds QualityPilot — an AI Bug Detective that turns failed CI runs into proposed fixes as GitHub PRs. See how it works.

Related posts