8 patterns that make your Playwright tests flaky (with fixes)
A Playwright-specific field guide to the eight recurring patterns that turn green CI red on a Tuesday at 4 PM. Each one with a concrete smell, the deterministic fix, and why auto-wait didn't save you this time.
Companion post to our earlier 11 patterns that make Jest tests flaky. Same idea, Playwright-specific: the eight recurring shapes that turn green E2E suites red on a Tuesday, with the deterministic fix for each.
Playwright is better than Selenium/Cypress at hiding flakiness — its built-in auto-waiting handles most of the obvious races for you. That's also why Playwright flakes tend to be subtler: the auto-wait masks 95% of the problem, then the 5% lands in CI as "intermittent, can't reproduce locally." These are those 5%.
Quick callout up top: if you'd rather paste a test file and get the answer, qlens.dev/tools/jest-flaky-detector works on Playwright too — the patterns overlap enough that the heuristic is useful. Free, no signup.
1. waitForTimeout — Playwright's own footgun
Playwright ships page.waitForTimeout(ms) and explicitly warns against using it in tests. People use it anyway because it's the fastest way to silence a flaky test.
// Flaky
test('saves on click', async ({ page }) => {
await page.click('button:has-text("Save")');
await page.waitForTimeout(1000); // "just wait for it"
await expect(page.locator('.toast')).toHaveText('Saved');
});
// Deterministic
test('saves on click', async ({ page }) => {
await page.click('button:has-text("Save")');
await expect(page.locator('.toast')).toHaveText('Saved');
// web-first assertion auto-waits up to 5s for the toast + text
});
Playwright's web-first assertions (toHaveText, toBeVisible, toHaveCount, etc.) already poll with a default timeout. You almost never need a manual wait; if you do, it's a sign the action you just fired returns too early to reason about.
2. Selectors that rely on layout
Class names from a CSS-in-JS library change on every build. Tests that select by them pass in dev, fail in production.
// Flaky — selector ties to generated hash
await page.click('.button-css-xq3a2p-primary');
// Deterministic — semantic selector
await page.getByRole('button', { name: 'Save changes' });
getByRole + getByText + getByLabel are stable across styling refactors. When they're not specific enough, use data-testid, not class names.
3. Storing locators in variables, then asserting twice
This is a Playwright-specific gotcha. A locator is a reference, not a snapshot. If the DOM updates between your two interactions with the same locator, the second interaction might fire on a replaced element.
// Flaky — the button reference can go stale between actions
const btn = page.getByRole('button', { name: 'Load more' });
await btn.click(); // triggers re-render
await expect(btn).toBeDisabled(); // might resolve the selector to the new button, or might not
// Deterministic — let Playwright re-resolve on each call
await page.getByRole('button', { name: 'Load more' }).click();
await expect(page.getByRole('button', { name: 'Load more' })).toBeDisabled();
Locators aren't expensive — re-resolving is free. Never reuse a locator after an action that could replace its target.
4. Network-condition tests without route interception
Your test verifies "when the API is slow, show a spinner". In dev the API is fast, so the spinner never renders long enough for the assertion. CI is even faster because it hits a mocked backend — the spinner flashes for 20ms, the assertion times out, test fails.
// Flaky
test('shows loading spinner', async ({ page }) => {
await page.goto('/dashboard');
await expect(page.getByTestId('loading-spinner')).toBeVisible();
// misses the spinner 40% of the time
});
// Deterministic — intercept + delay
test('shows loading spinner', async ({ page }) => {
await page.route('**/api/dashboard', async (route) => {
await new Promise(r => setTimeout(r, 500));
await route.continue();
});
await page.goto('/dashboard');
await expect(page.getByTestId('loading-spinner')).toBeVisible();
});
page.route gives you explicit control over when the real response lands. Your spinner assertion now has a guaranteed window.
5. page.on('dialog') handlers registered too late
Native browser dialogs (alert, confirm, prompt) block execution and auto-dismiss after a short timeout in Playwright if no handler is attached. Handler-after-trigger = race between "did my handler register in time" and "did the dialog auto-dismiss".
// Flaky — handler race
test('confirms delete', async ({ page }) => {
await page.click('text=Delete'); // dialog fires immediately
page.on('dialog', dialog => dialog.accept()); // might be too late
await expect(page.getByText('Deleted')).toBeVisible();
});
// Deterministic — register BEFORE the trigger
test('confirms delete', async ({ page }) => {
page.once('dialog', dialog => dialog.accept());
await page.click('text=Delete');
await expect(page.getByText('Deleted')).toBeVisible();
});
Use page.once (not on) so a second unexpected dialog in the same test fails loudly instead of silently accepting. Register the handler in the line above the trigger, not after.
6. Shared storage state between tests
Playwright's storageState file is a way to skip the login flow in every test. It also ships with a foot-gun: if two tests write to localStorage/sessionStorage via their interactions, the state leaks into whichever test runs next.
// Flaky — test B inherits cookies + localStorage from test A
test('A', async ({ page }) => {
await page.goto('/settings');
await page.fill('input[name="theme"]', 'dark');
});
test('B', async ({ page }) => {
await page.goto('/');
// expects default theme but finds 'dark' from test A's leak
await expect(page.locator('html')).toHaveClass(/light/);
});
// Deterministic — isolate state per test
test.use({ storageState: { cookies: [], origins: [] } });
// OR clear at the start of each test
test.beforeEach(async ({ context }) => {
await context.clearCookies();
await context.clearPermissions();
});
Better: give tests that modify persistent state their own test.use({ storageState: ... }) block with a purpose-built fixture.
7. page.goto with default waitUntil
page.goto defaults to waitUntil: 'load' — resolve when the browser's load event fires. SPAs fire load after the shell HTML arrives, before the React/Vue/Svelte app mounts. Assertions fire against an empty shell.
// Flaky — asserts against empty shell
test('dashboard loads', async ({ page }) => {
await page.goto('/dashboard');
await expect(page.getByRole('heading', { name: 'Metrics' })).toBeVisible();
// might time out because the React tree hasn't mounted yet
});
// Deterministic — wait for network idle or a specific marker
test('dashboard loads', async ({ page }) => {
await page.goto('/dashboard', { waitUntil: 'networkidle' });
// OR better:
await page.goto('/dashboard');
await expect(page.getByRole('heading', { name: 'Metrics' })).toBeVisible();
// Playwright auto-waits up to 5s — usually enough for SPA mount
});
networkidle can be slow if your app has long-polling connections. Prefer a specific DOM assertion as your "we're ready" signal. Web-first assertions auto-wait anyway, so this usually works without waitUntil at all.
8. Parallel workers writing to shared test data
Playwright runs tests in parallel workers by default (fullyParallel: true in playwright.config.ts since v1.38). Two workers inserting rows into the same database table, or writing to /tmp/fixture.json, race.
// Flaky — all workers hit the same seeded user row
test('updates profile', async ({ page }) => {
await loginAs(page, 'test@example.com');
await page.fill('input[name="name"]', 'Updated');
await page.click('button:has-text("Save")');
});
// Deterministic — worker-scoped fixtures
test.describe.serial('profile updates', () => { // forces serial within the describe
// OR better: per-test unique user
test('updates profile', async ({ page }, testInfo) => {
const uniqueUser = `test-${testInfo.workerIndex}-${Date.now()}@example.com`;
await signUpAs(page, uniqueUser);
await loginAs(page, uniqueUser);
await page.fill('input[name="name"]', 'Updated');
await page.click('button:has-text("Save")');
});
});
Playwright's testInfo.workerIndex + testInfo.parallelIndex are the two primitives you reach for. If your backend supports row-level tenancy, each worker should pin its own tenant. Otherwise: test.describe.serial.
The bigger point: Playwright hides flakes that other frameworks surface
Cypress would fail obviously if you forgot to wait for a modal. Selenium would throw StaleElementReferenceException when you re-used a locator. Playwright's auto-wait + built-in retries cover for most of both categories — which is usually what you want, except when it silently masks a real race that shows up later under CI load.
The patterns above are the ones auto-wait doesn't save you from. When your suite is 95% green locally but fails 2-3 tests per run in CI with different offenders each time, that's auto-wait doing its job and the 8 patterns above doing theirs.
Paste a Playwright test file at qlens.dev/tools/jest-flaky-detector — the detector is framework-aware enough to catch the shared patterns. Free, no signup. Takes 2 seconds.
If you want the fix proposed as a PR against your own repo instead of a paste-and-analyze tool, qlens.dev ingests via our Playwright reporter. First 10 fixes per month are free.
About QualityPilot
QualityPilot watches your CI for failed tests and proposes a fix as a GitHub PR. You merge or you don't — no auto-merge, no fluff. See how it works.
Related posts
- April 19, 202611 patterns that make your Jest tests flaky (with examples)If you've ever stared at a CI run that passed locally and failed on main with no code change, you've met a flaky test. They have personalities. Most of them fall into eleven recurring patterns — and…
- April 22, 2026Vercel breach, April 2026 — what we did in the first 60 minutesYesterday DOU covered the Vercel breach in detail. Short version: a Vercel employee connected an AI tool (Context.ai) to their Google Workspace via OAuth. Context.ai got compromised. Attackers…
- April 19, 2026I shipped 72 PRs to a SaaS in 24 hours with Claude Code (and what broke)Yesterday I told Claude Code: "build out QualityPilot for the next 10 hours, full autonomy, don't ask for confirmation."