April 22, 20265 min read

8 patterns that make your Playwright tests flaky (with fixes)

A Playwright-specific field guide to the eight recurring patterns that turn green CI red on a Tuesday at 4 PM. Each one with a concrete smell, the deterministic fix, and why auto-wait didn't save you this time.

playwrightflaky-teststestinge2ejavascript

Companion post to our earlier 11 patterns that make Jest tests flaky. Same idea, Playwright-specific: the eight recurring shapes that turn green E2E suites red on a Tuesday, with the deterministic fix for each.

Playwright is better than Selenium/Cypress at hiding flakiness — its built-in auto-waiting handles most of the obvious races for you. That's also why Playwright flakes tend to be subtler: the auto-wait masks 95% of the problem, then the 5% lands in CI as "intermittent, can't reproduce locally." These are those 5%.

Quick callout up top: if you'd rather paste a test file and get the answer, qlens.dev/tools/jest-flaky-detector works on Playwright too — the patterns overlap enough that the heuristic is useful. Free, no signup.

1. `waitForTimeout` — Playwright's own footgun

Playwright ships page.waitForTimeout(ms) and explicitly warns against using it in tests. People use it anyway because it's the fastest way to silence a flaky test.

// Flaky
test('saves on click', async ({ page }) => {
  await page.click('button:has-text("Save")');
  await page.waitForTimeout(1000);   // "just wait for it"
  await expect(page.locator('.toast')).toHaveText('Saved');
});

// Deterministic
test('saves on click', async ({ page }) => {
  await page.click('button:has-text("Save")');
  await expect(page.locator('.toast')).toHaveText('Saved');
  // web-first assertion auto-waits up to 5s for the toast + text
});

Playwright's web-first assertions (toHaveText, toBeVisible, toHaveCount, etc.) already poll with a default timeout. You almost never need a manual wait; if you do, it's a sign the action you just fired returns too early to reason about.

2. Selectors that rely on layout

Class names from a CSS-in-JS library change on every build. Tests that select by them pass in dev, fail in production.

// Flaky — selector ties to generated hash
await page.click('.button-css-xq3a2p-primary');

// Deterministic — semantic selector
await page.getByRole('button', { name: 'Save changes' });

getByRole + getByText + getByLabel are stable across styling refactors. When they're not specific enough, use data-testid, not class names.

3. Storing locators in variables, then asserting twice

This is a Playwright-specific gotcha. A locator is a reference, not a snapshot. If the DOM updates between your two interactions with the same locator, the second interaction might fire on a replaced element.

// Flaky — the button reference can go stale between actions
const btn = page.getByRole('button', { name: 'Load more' });
await btn.click();            // triggers re-render
await expect(btn).toBeDisabled();   // might resolve the selector to the new button, or might not

// Deterministic — let Playwright re-resolve on each call
await page.getByRole('button', { name: 'Load more' }).click();
await expect(page.getByRole('button', { name: 'Load more' })).toBeDisabled();

Locators aren't expensive — re-resolving is free. Never reuse a locator after an action that could replace its target.

4. Network-condition tests without route interception

Your test verifies "when the API is slow, show a spinner". In dev the API is fast, so the spinner never renders long enough for the assertion. CI is even faster because it hits a mocked backend — the spinner flashes for 20ms, the assertion times out, test fails.

// Flaky
test('shows loading spinner', async ({ page }) => {
  await page.goto('/dashboard');
  await expect(page.getByTestId('loading-spinner')).toBeVisible();
  // misses the spinner 40% of the time
});

// Deterministic — intercept + delay
test('shows loading spinner', async ({ page }) => {
  await page.route('**/api/dashboard', async (route) => {
    await new Promise(r => setTimeout(r, 500));
    await route.continue();
  });
  await page.goto('/dashboard');
  await expect(page.getByTestId('loading-spinner')).toBeVisible();
});

page.route gives you explicit control over when the real response lands. Your spinner assertion now has a guaranteed window.

5. `page.on('dialog')` handlers registered too late

Native browser dialogs (alert, confirm, prompt) block execution and auto-dismiss after a short timeout in Playwright if no handler is attached. Handler-after-trigger = race between "did my handler register in time" and "did the dialog auto-dismiss".

// Flaky — handler race
test('confirms delete', async ({ page }) => {
  await page.click('text=Delete');                    // dialog fires immediately
  page.on('dialog', dialog => dialog.accept());       // might be too late
  await expect(page.getByText('Deleted')).toBeVisible();
});

// Deterministic — register BEFORE the trigger
test('confirms delete', async ({ page }) => {
  page.once('dialog', dialog => dialog.accept());
  await page.click('text=Delete');
  await expect(page.getByText('Deleted')).toBeVisible();
});

Use page.once (not on) so a second unexpected dialog in the same test fails loudly instead of silently accepting. Register the handler in the line above the trigger, not after.

6. Shared storage state between tests

Playwright's storageState file is a way to skip the login flow in every test. It also ships with a foot-gun: if two tests write to localStorage/sessionStorage via their interactions, the state leaks into whichever test runs next.

// Flaky — test B inherits cookies + localStorage from test A
test('A', async ({ page }) => {
  await page.goto('/settings');
  await page.fill('input[name="theme"]', 'dark');
});

test('B', async ({ page }) => {
  await page.goto('/');
  // expects default theme but finds 'dark' from test A's leak
  await expect(page.locator('html')).toHaveClass(/light/);
});

// Deterministic — isolate state per test
test.use({ storageState: { cookies: [], origins: [] } });

// OR clear at the start of each test
test.beforeEach(async ({ context }) => {
  await context.clearCookies();
  await context.clearPermissions();
});

Better: give tests that modify persistent state their own test.use({ storageState: ... }) block with a purpose-built fixture.

7. `page.goto` with default `waitUntil`

page.goto defaults to waitUntil: 'load' — resolve when the browser's load event fires. SPAs fire load after the shell HTML arrives, before the React/Vue/Svelte app mounts. Assertions fire against an empty shell.

// Flaky — asserts against empty shell
test('dashboard loads', async ({ page }) => {
  await page.goto('/dashboard');
  await expect(page.getByRole('heading', { name: 'Metrics' })).toBeVisible();
  // might time out because the React tree hasn't mounted yet
});

// Deterministic — wait for network idle or a specific marker
test('dashboard loads', async ({ page }) => {
  await page.goto('/dashboard', { waitUntil: 'networkidle' });
  // OR better:
  await page.goto('/dashboard');
  await expect(page.getByRole('heading', { name: 'Metrics' })).toBeVisible();
  // Playwright auto-waits up to 5s — usually enough for SPA mount
});

networkidle can be slow if your app has long-polling connections. Prefer a specific DOM assertion as your "we're ready" signal. Web-first assertions auto-wait anyway, so this usually works without waitUntil at all.

8. Parallel workers writing to shared test data

Playwright runs tests in parallel workers by default (fullyParallel: true in playwright.config.ts since v1.38). Two workers inserting rows into the same database table, or writing to /tmp/fixture.json, race.

// Flaky — all workers hit the same seeded user row
test('updates profile', async ({ page }) => {
  await loginAs(page, 'test@example.com');
  await page.fill('input[name="name"]', 'Updated');
  await page.click('button:has-text("Save")');
});

// Deterministic — worker-scoped fixtures
test.describe.serial('profile updates', () => {  // forces serial within the describe
  // OR better: per-test unique user
  test('updates profile', async ({ page }, testInfo) => {
    const uniqueUser = `test-${testInfo.workerIndex}-${Date.now()}@example.com`;
    await signUpAs(page, uniqueUser);
    await loginAs(page, uniqueUser);
    await page.fill('input[name="name"]', 'Updated');
    await page.click('button:has-text("Save")');
  });
});

Playwright's testInfo.workerIndex + testInfo.parallelIndex are the two primitives you reach for. If your backend supports row-level tenancy, each worker should pin its own tenant. Otherwise: test.describe.serial.

The bigger point: Playwright hides flakes that other frameworks surface

Cypress would fail obviously if you forgot to wait for a modal. Selenium would throw StaleElementReferenceException when you re-used a locator. Playwright's auto-wait + built-in retries cover for most of both categories — which is usually what you want, except when it silently masks a real race that shows up later under CI load.

The patterns above are the ones auto-wait doesn't save you from. When your suite is 95% green locally but fails 2-3 tests per run in CI with different offenders each time, that's auto-wait doing its job and the 8 patterns above doing theirs.

Paste a Playwright test file at qlens.dev/tools/jest-flaky-detector — the detector is framework-aware enough to catch the shared patterns. Free, no signup. Takes 2 seconds.

If you want the fix proposed as a PR against your own repo instead of a paste-and-analyze tool, qlens.dev ingests via our Playwright reporter. First 10 fixes per month are free.

ShareTwitter LinkedIn

About QualityPilot

QualityPilot watches your CI for failed tests and proposes a fix as a GitHub PR. You merge or you don't — no auto-merge, no fluff. See how it works.

1. waitForTimeout — Playwright's own footgun