BLOGS ABOUT RESUME TIL
Back to Blogs

Give Your AI Agents Built-in Quality Checks with Chrome DevTools MCP

ai agents mcp chrome devtools lighthouse performance accessibility

A few weeks ago I was debugging an agent that tested a web app — clicking through flows, filling forms, taking screenshots. It worked. Except when I looked at what it was actually producing, I realized: it had no idea if the page it was testing was accessible. Or if it had just leaked 200MB of memory. Or if a button it had clicked a hundred times had quietly worsened the LCP score.

The agent was busy. The agent was not quality-aware.

That’s the gap chrome-devtools-mcp has been closing, and v0.21.0 is a significant step in that direction.

Quick context: what this package is

chrome-devtools-mcp is an npm package that exposes Chrome DevTools as an MCP (Model Context Protocol) server. That means any AI agent that speaks MCP — Claude, Cursor, Gemini, Copilot, whatever — can directly control and inspect a live Chrome browser. Navigate pages, click, fill forms, capture screenshots, take heap snapshots, run Lighthouse audits. All through the same interface your agent uses for everything else.

Getting it running is one line:

npx -y chrome-devtools-mcp@latest

You’ll need Node.js v20.19+ and Chrome. That’s it.

Now — one thing I want to be clear about before we go further, because the repo has both and it’s easy to mix them up:

The MCP server is the tool provider. It exposes a set of ~33 tools your agent can call: take_screenshot, lighthouse_audit, take_memory_snapshot, select_page, and so on. These are the actual capabilities.

Skills are separate. They’re markdown instruction files that live in the repo and tell your agent how to use those tools correctly — the right sequence, the right parameters, the important gotchas. Think of them as recipes. The memory leak skill, for example, isn’t adding new tools. It’s giving the agent a proven workflow for the tools that already exist. This distinction matters when you’re setting things up — you configure the MCP server once, and then you decide which skill files to include in your agent’s context.

Okay. Let’s get into what’s actually new.


Lighthouse audits: accessibility, SEO, best practices — but not performance

The lighthouse_audit tool got a fix in v0.21.0: it was incorrectly marked as read-only, which meant it failed silently in execution contexts that restricted write operations. That’s fixed.

But here’s something important to understand about this tool — and I want to be explicit because I’ve seen this get confused:

lighthouse_audit does not cover performance metrics. No LCP, no CLS, no INP. It’s scoped to accessibility, SEO, and best practices.

That’s intentional, not an oversight. Runtime performance is handled by a completely separate set of tools: performance_start_trace, performance_stop_trace, and performance_analyze_insight. Those are the trace tools, designed for profiling actual runtime behavior.

So your agent can do things like:

lighthouse_audit(url: "https://myapp.com", mode: "navigation", device: "desktop")

And it’ll come back with accessibility violations, SEO issues, best practices flags. Really useful. Just don’t expect LCP numbers in that output.

For LCP specifically, there’s a debug-optimize-lcp skill that walks an agent through the proper sequence: emulate a slow network using emulate, start a trace with performance_start_trace, load the page, stop the trace, then call performance_analyze_insight with specific insights like LCPBreakdown, RenderBlocking, and LCPDiscovery. The thresholds are what you’d expect — under 2.5s is good, 2.5–4.0s needs work, above 4.0s is poor.

The point is the agent can now diagnose LCP fully autonomously. No human pulling up DevTools, no manual trace review.


Memory leak detection: this one surprised me

This is the headline feature of v0.21.0, and honestly, it’s the one I find most interesting from an agent workflow perspective.

Before this skill existed, if you wanted an agent to catch memory leaks, you’d have to describe the methodology yourself in your prompt. And the methodology is not obvious — it involves taking multiple heap snapshots at specific moments, repeating interactions to amplify the signal, and then running an external analysis tool. Getting an agent to do all that reliably without guidance was… painful, in my experience.

The new memory-leak-debugging skill gives the agent a solid recipe. The core tool is take_memory_snapshot:

Tool: take_memory_snapshot
Description: Capture a heap snapshot of the currently selected page.
             Use to analyze the memory distribution of JS objects
             and debug memory leaks.
Parameters: filePath — where to save the .heapsnapshot file

The skill’s methodology is smart:

  1. Use click, navigate_page, and fill to put the app through a user flow
  2. Repeat that interaction 10 times — this is key, because small leaks don’t show up after one click
  3. Take three snapshots: baseline (before any interaction), post-action, and post-revert
  4. Run memlab against the .heapsnapshot files for automated leak detection
  5. If memlab isn’t available, use a provided compare_snapshots.js script

The instruction about repeating 10 times is the kind of thing you’d only figure out by running this yourself. A single interaction often doesn’t produce enough retained memory to distinguish a real leak from normal GC variance. 10 repetitions amplifies the signal.

One thing the skill is very explicit about — and this is worth paying attention to if you build on this:

“Do NOT attempt to read raw .heapsnapshot files directly, as they are extremely large and will consume too many tokens.”

A heap snapshot from a real app can be hundreds of megabytes. Your agent should hand off analysis to memlab or the comparison script, not try to read the file. I’ve seen agents try to ingest large files directly when not told otherwise. The skill documentation heads this off.

There are also three experimental memory tools available if you enable --experimentalMemory:

  • load_memory_snapshot — summary stats from a snapshot file
  • get_memory_snapshot_details — full details with pagination
  • get_nodes_by_class — returns all instances of a specific class

These are marked experimental, so treat them accordingly, but the class lookup is genuinely useful for targeted debugging.


Accessibility auditing: layered, not just a score

The a11y-debugging skill is a proper multi-step workflow, not just “run Lighthouse and report the score.”

Here’s what the sequence looks like:

Start with a Lighthouse baselinelighthouse_audit(mode: "navigation", device: "desktop") — to get the comprehensive accessibility violation list.

Then go deeper with the accessibility treetake_snapshot(). The skill describes this as “the most reliable source of truth for semantic structure.” You’re looking at heading hierarchy, ARIA labels, form input associations, whether landmark regions are set up correctly. Lighthouse can miss things that the accessibility tree catches.

Surface browser-native issueslist_console_messages(types: ["issue"], includePreservedMessages: true). Chrome flags certain accessibility problems natively. This pulls them out.

Simulate keyboard navigationpress_key("Tab") and press_key("Shift+Tab"). The agent works through the page as a keyboard-only user would, checking for focus traps in modals, verifying focus order is logical, checking that focus indicators are actually visible.

Screenshot for visual checkstake_screenshot(). Useful for tap target sizing (the 48×48px minimum) and color contrast.

One thing I appreciated about how this skill is written: it explicitly distinguishes between opacity: 0 (visually hidden but still read by screen readers) and display: none or aria-hidden (completely hidden from assistive technology). Getting that wrong produces confusing audit results, and the skill flags it.


Multi-agent page routing: the architectural piece

This is the most interesting addition for anyone building parallel agent workflows, and it’s worth understanding what problem it actually solves.

Before v0.21.0, if you had two agents both using the MCP server, they’d fight over the same “currently selected page.” Agent A navigates somewhere, Agent B navigates somewhere else, now Agent A’s tool calls are hitting the wrong page. It’s a coordination nightmare.

v0.21.0 solves this with a numeric page registry. Each page gets an auto-incrementing ID when created:

// Internally in McpContext.ts
#nextPageId = 1;
// On new page: mcpPage = new McpPage(page, this.#nextPageId++);

The tools that matter:

  • new_page — creates a browser page with a unique ID
  • list_pages — shows all open pages and their IDs
  • select_page(pageId) — routes subsequent tool calls to that specific page
  • close_page(pageId) — clean up when done

In practice, this means you can run something like this:

  • Agent A calls new_page → gets page ID 1, navigates to the checkout flow
  • Agent B calls new_page → gets page ID 2, navigates to the product listing
  • Agent A calls select_page(1) before any tool call → all its actions target page 1
  • Agent B calls select_page(2) before any tool call → all its actions target page 2

They run in parallel, completely isolated from each other.

Why does this matter in practice? Because quality audits are slow. A full a11y audit plus an LCP trace on one page can take 30+ seconds. If you’re auditing 5 flows in a test suite, running them serially means waiting several minutes. With page isolation, you can parallelize — each agent works its own page, they don’t interfere, and the whole suite runs in the time it takes the slowest single audit.

The skill documentation says: “You can send multiple tool calls in parallel, but maintain correct order: navigate → wait → snapshot → interact.” That sequence matters — if you fire tools out of order, you’ll get stale snapshots or actions on partially loaded pages.


The new CLI tool

Alongside the MCP server, v0.21.0 ships an experimental chrome-devtools CLI binary. This is designed for terminal workflows and CI scripting, not agent workflows.

npm i chrome-devtools-mcp@latest -g
chrome-devtools status
chrome-devtools <tool> [arguments] [flags]

It runs a background daemon that starts automatically. Output is Markdown by default, --output-format=json for scripting.

A few things to know going in: headless mode is on by default in CLI mode, and the CLI intentionally excludes network tools, extension browsing, screencast, and performance tools. It’s designed for lighter checks — quick accessibility scans, screenshot capture, that kind of thing — rather than full agent workflows. If you want the full toolkit, you want the MCP server.

Still useful for CI pipelines where you just want a quick sanity check before a deploy.


So what does this actually buy you?

Let me be direct about what I think the value is here, because it’s easy to look at a feature list and shrug.

The problem with most agent testing workflows is that agents are good at doing things but blind to quality. They’ll click the button, fill the form, navigate to the page — and report success. But “the page loaded” and “the page is accessible and performs well” are completely different things.

What v0.21.0 is really adding is quality loop closure. Your agent doesn’t just act — it can now verify. Run the checkout flow, then immediately run a memory snapshot series to check if the cart component is leaking. Navigate to the landing page, run the LCP trace with throttling to catch what users on slower connections experience. Audit the form for keyboard accessibility before you call the test passing.

That’s a different kind of agent. Not just an automation — something closer to a QA engineer.

The skills are what make this practical. Without them, an agent that “has access to DevTools tools” is like a developer who has Chrome open but doesn’t know about the Memory panel. The tools exist; the knowledge of how to use them correctly is what the skills provide.


Check out the GitHub repo for the full release notes and skill files.

Thanks for reading! If you’re using chrome-devtools-mcp in your agent workflows, I’d genuinely love to hear what you’re doing with it — drop a comment below.

Further Read