CI/CDWorkflowCase study

We replaced our manual UI review checklist with one CI check

Ananya Rao

QA Lead

April 22, 2026

5 min read

Our pre-release ritual used to involve a shared spreadsheet and a list of screens someone had to open, squint at, and tick off. It worked, in the sense that obvious breakage got caught. It failed at exactly the things humans are bad at: a 4px padding change, a heading that dropped a weight, a colour that drifted half a shade after a dependency bump.

The fix wasn't more discipline. It was moving the boring comparison into CI and letting people spend their attention on judgement calls instead.

What the setup looks like

Playwright captures the same set of screens on every pull request.
Each screenshot is compared against an approved baseline via the PixellPeep CLI.
A single aggregated check reports back on the PR — green if nothing drifted, red with a link if something did.
When a change is intentional, you approve the new baseline from the report and re-run. No code change needed.

The win wasn't catching more bugs. It was trusting the release without a war room.

The part that surprised us

We expected to catch regressions. What we didn't expect was how much faster reviews got. A reviewer no longer has to imagine what a change looks like — the diff is right there in the check. Design sign-off went from a meeting to a glance.

Paying per comparison rather than per browser screenshot also meant we could afford to cover three viewports without watching a meter. That's the model we dogfood on PixellPeep itself, every release.

We replaced our manual UI review checklist with one CI check

What the setup looks like

The part that surprised us

Keep reading

Why your screenshot tests keep crying wolf

Pixel-diff vs. structural similarity: picking the right engine