PlaywrightVisual testingCI/CD

Playwright visual regression testing: a practical guide

Ananya Rao

QA Lead

June 4, 2026

13 min read

Playwright has a real edge for visual regression testing: multi-browser support, screenshot comparison built in, and sensible handling of anti-aliasing. You can get a visual check running in minutes — which is exactly why teams underestimate what a production-grade setup actually requires.

This guide covers the built-in workflow properly, the determinism work that separates a useful suite from a flaky one, running it in CI, and the point where teams outgrow committing baselines to git.

The built-in approach: toHaveScreenshot()

Playwright's assertion captures a screenshot, writes it as a baseline on first run, and diffs against it on every run after:

import { test, expect } from '@playwright/test';

test('home page looks right', async ({ page }) => {
  await page.goto('/');
  await expect(page).toHaveScreenshot('home.png');
});

The first run creates home.png and fails deliberately, telling you a baseline was written. Commit it, and subsequent runs compare against it. To intentionally re-record after a design change:

npx playwright test --update-snapshots

Configure tolerances once, globally

Two screenshots of an unchanged page are rarely byte-identical — anti-aliasing and sub-pixel rendering see to that. Set tolerances in the config rather than sprinkling them through tests:

// playwright.config.ts
import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  use: {
    viewport: { width: 1280, height: 720 },
  },
  expect: {
    toHaveScreenshot: {
      // Absorb sub-pixel noise without hiding real changes
      maxDiffPixelRatio: 0.01,
      threshold: 0.2,
      animations: 'disabled',
      caret: 'hide',
      scale: 'css',
    },
  },
  projects: [
    { name: 'chromium', use: { ...devices['Desktop Chrome'] } },
  ],
});

animations: 'disabled' and caret: 'hide' are doing real work here — they remove two of the most common sources of random failure before you write a single test.

Making captures deterministic

Most 'flaky visual tests' are not flaky at all — the page genuinely differed between runs. Fix the input before you loosen the threshold.

Wait for fonts and images

A web font that loads after your assertion means the baseline captured a fallback typeface and the next run captured the real one. Every line of text shifts:

await page.goto('/');
await page.waitForLoadState('networkidle');
await page.evaluate(() => document.fonts.ready);
await expect(page).toHaveScreenshot('home.png');

Mask genuinely dynamic regions

Timestamps, live counters, rotating avatars and carousels will never match. Mask them rather than fighting them — Playwright paints the masked area a solid colour in both baseline and comparison:

await expect(page).toHaveScreenshot('dashboard.png', {
  mask: [
    page.locator('[data-test="last-updated"]'),
    page.locator('[data-test="live-chart"]'),
  ],
});

Freeze time when the UI shows relative dates

await page.clock.setFixedTime(new Date('2026-01-01T00:00:00Z'));
await page.goto('/activity');

Snapshot a component, not the whole page

A full-page screenshot fails if anything anywhere changes. When you only care about one component, assert on the locator — smaller surface, far fewer false positives:

await expect(page.locator('[data-test="pricing-table"]'))
  .toHaveScreenshot('pricing-table.png');

The environment problem

This is the one that catches every team. Font rendering and anti-aliasing differ between macOS and Linux, so a baseline recorded on a developer laptop will not match a Linux CI runner — the diff looks like a regression but nothing changed. Playwright even names snapshots per-platform for this reason.

The standard fix is to record and compare in the same container image everyone's CI uses:

docker run --rm -v $(pwd):/work -w /work \
  mcr.microsoft.com/playwright:v1.50.0-jammy \
  npx playwright test --update-snapshots

Pin that image tag. Bumping the Playwright or browser version can shift rendering and invalidate every baseline at once.

Running it in CI

name: e2e
on: [pull_request]

jobs:
  playwright:
    runs-on: ubuntu-latest
    container:
      image: mcr.microsoft.com/playwright:v1.50.0-jammy
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npx playwright test
      - uses: actions/upload-artifact@v4
        if: failure()
        with:
          name: playwright-report
          path: playwright-report/

Uploading the report on failure matters — without it, a red build tells you something looks different but not what.

Where the built-in approach runs out of road

The built-in tooling is genuinely good. The friction is operational, and it shows up at scale:

Baselines live in git — PNGs bloat the repository and a pull request full of binary changes is unreviewable.
Cross-environment drift — teams end up maintaining per-platform baselines or forcing everything through Docker.
No shared review — approving an intentional change means running --update-snapshots and committing images, not clicking approve.
No history — you cannot easily see when a component started drifting, or compare across a release.
One comparison engine — you tune a threshold globally rather than choosing how each screen should be compared.

A managed layer on top of Playwright

The alternative keeps your Playwright tests exactly as they are and moves baselines, comparison and review off your repo. Install the SDK:

npm install --save-dev @pixellpeep/playwright @pixellpeep/client @playwright/test

Merge the config into your existing defineConfig:

// playwright.config.ts
import { defineConfig } from '@playwright/test';
import { pixellpeepConfig } from '@pixellpeep/playwright';

export default defineConfig({
  ...pixellpeepConfig({
    apiUrl: process.env.PIXELLPEEP_API,
    apiKey: process.env.PIXELLPEEP_API_KEY,
    projectId: process.env.PIXELLPEEP_PROJECT_ID,
  }),
});

Then import test from the SDK to get a pixellpeep fixture alongside page:

import { test } from '@pixellpeep/playwright';

test('login page', async ({ page, pixellpeep }) => {
  await page.goto('/login');
  await pixellpeep.snapshot('login-page', {
    screenshot: { fullPage: true },
  });
});

Baselines are stored server-side, so no images enter your repo and there's no laptop-versus-CI drift. Update a baseline deliberately with updateBaseline: true or PIXELLPEEP_UPDATE=1, and pick a comparison algorithm per snapshot when one screen needs stricter or looser treatment than the rest.

Troubleshooting

Everything differs slightly — anti-aliasing. Raise maxDiffPixelRatio a little, or compare perceptually rather than pixel-exactly.
Text shifted a pixel — a font loaded late, or the baseline came from a different OS. Await document.fonts.ready and record baselines in the CI image.
Passes locally, fails in CI — rendering environment. Run both in the same pinned container.
A single region always differs — dynamic content. Mask it or freeze the clock.
Every snapshot broke after an upgrade — Playwright or the browser changed. Re-record deliberately instead of loosening thresholds.

Frequently asked questions

Does Playwright have built-in visual regression testing?

Yes. expect(page).toHaveScreenshot() captures a baseline on first run and compares against it afterwards, with configurable threshold, maxDiffPixelRatio, masking and animation handling. Baselines are stored as files next to your tests.

Where does Playwright store baseline screenshots?

In a snapshots folder beside the spec file, named per test, platform and browser — which is why baselines recorded on macOS do not satisfy a Linux CI run.

How do I update Playwright baselines?

Run npx playwright test --update-snapshots. Do it deliberately after an intentional design change, and re-record in the same environment CI uses.

Why do my Playwright screenshots fail only in CI?

Almost always font rendering and anti-aliasing differences between your OS and the CI runner. Record and compare inside the same pinned Playwright Docker image, or use a service that handles normalisation for you.

Where to land

Start with the built-in tooling — it is well designed and costs nothing to try. Spend your effort on deterministic captures, because no threshold rescues a screenshot that was never reproducible. Reach for a managed layer when baselines in git start hurting, when more than one person needs to review diffs, or when you want per-screen control over how comparison actually works.

Use Playwright's built-in screenshots to learn the workflow; reach for a managed platform the moment more than one person needs to trust the results.