BLOGS ABOUT RESUME TIL
Back to Blogs

WebMCP for Beginners

webmcp mcp ai agents web development frontend

I have been using AI coding tools more seriously over the last few months, and one thing became obvious pretty quickly: the model is rarely the only interesting part.

The real difference often comes from what the model can see and what it can do.

If an agent can only read the files in your repo, it can suggest code. If it can also open the app in a browser, click through a flow, inspect console errors, run a performance trace, and check accessibility issues, it starts behaving more like a useful engineering assistant.

That is the MCP side of the story.

Chrome’s WebMCP early preview is interesting because it looks at the same problem from the other direction. Its follow-up post on when to use WebMCP and MCP makes the distinction clearer.

Instead of asking only “how can an agent control the browser?”, WebMCP asks something more web-native:

What if the website itself could expose structured actions for agents?

That changes the conversation.

First, what is MCP?

MCP stands for Model Context Protocol. The simplest way to think about it is this:

MCP is a standard way for AI tools to connect to external tools, data, and workflows.

For web developers, that usually means your AI assistant can connect to things like:

  • a browser
  • Chrome DevTools
  • Playwright
  • GitHub
  • local files
  • documentation
  • internal APIs

Without MCP, every AI tool has to build its own custom integration for every external system. Claude needs one GitHub integration. Cursor needs another. VS Code needs another. Your own agent needs another.

With MCP, the integration can be exposed once as a server, and different clients can connect to it. It can run outside the browser, talk to external systems, call APIs, read data sources, and handle workflows in the background.

That is useful, but when the task is happening inside a live website, MCP still mostly treats the web page as something the agent has to operate from the outside.

The agent looks at the page, reads the DOM, clicks buttons, fills inputs, and tries to infer what the user wants to do.

Sometimes that works. Sometimes it is fragile.

The problem with agents using websites like humans

A lot of current browser agents work by actuating the UI.

They inspect the DOM or accessibility tree, decide what element looks relevant, then click or type like a user would.

For simple flows, this can be fine.

For real product flows, it gets messy quickly:

  • The page has multiple similar buttons.
  • The form has hidden validation rules.
  • The checkout flow depends on state from previous steps.
  • A dropdown is visually obvious but structurally awkward.
  • The agent sees text, but not the product meaning behind it.
  • A small UI change breaks the automation.

This is not a new problem. We have seen versions of it with end-to-end tests for years.

The UI is great for humans because it carries visual hints, hierarchy, layout, and intent. But agents do not always read those signals the way humans do.

So the agent has to guess.

And guessing is where things become slow, brittle, and sometimes wrong.

What WebMCP is trying to do

Chrome describes WebMCP as a way for websites to expose structured tools so AI agents can interact with them with more speed, reliability, and precision.

That sentence is doing a lot of work, so I like to translate it like this:

WebMCP lets a website tell agents: “Here are the actions I support, here is the data I need, and here is how to perform the task correctly.”

Instead of an agent trying to reverse-engineer your UI, your site can provide a more direct contract.

For example, a travel website might expose a structured action for searching flights. A support product might expose an action for creating a support ticket. An ecommerce site might expose actions for finding products, configuring options, and moving through checkout.

One important detail from Chrome’s newer explanation: WebMCP is for the frontend. It only exists while the user is on your website. If the tab is closed or the user navigates away, those WebMCP tools go away too.

That makes it different from an MCP server. MCP can sit behind the scenes and work across tools, platforms, and data sources. WebMCP is tied to the live page and the browser agent interacting with it.

The important part is not that agents can click faster.

The important part is that websites can participate in the interaction.

That is a subtle but big shift. The website is no longer just a surface to be scraped or clicked. It becomes an active part of the agent workflow.

Two APIs: declarative and imperative

The Chrome article mentions two proposed APIs.

Declarative API

The declarative API is for standard actions that can be defined directly in HTML forms.

That makes sense as a starting point. Forms are already the web’s native way of saying:

  • this is the action
  • these are the fields
  • these are the constraints
  • this is how the user submits intent

A form already has structure. WebMCP seems to build on that idea by making the action clearer for agents.

A simplified mental model might look like this:

<form action="/support/tickets" method="post">
  <label>
    Issue summary
    <input name="summary" required />
  </label>

  <label>
    Priority
    <select name="priority">
      <option value="low">Low</option>
      <option value="high">High</option>
    </select>
  </label>

  <button type="submit">Create ticket</button>
</form>

Even without WebMCP, this is already more agent-friendly than a pile of unlabelled divs and click handlers.

That is one practical takeaway for frontend developers: if the web becomes more agentic, semantic HTML becomes even more important, not less.

A well-structured form gives both humans and machines a better chance of understanding the task.

Imperative API

The imperative API is for more complex interactions that require JavaScript.

Some flows cannot be described as a simple form submission. Think of a product configurator, a flight search with dynamic filters, or a workflow where one step depends on runtime state.

In those cases, the website may need to expose a tool-like action from JavaScript.

I am intentionally keeping this conceptual because WebMCP is still in early preview and the public Chrome post does not include the full API surface. But the shape is easy to imagine:

// Pseudo-code, not the actual WebMCP API
registerAgentAction({
  name: "searchFlights",
  description: "Search for flights between two cities",
  input: {
    from: "string",
    to: "string",
    date: "string",
    passengers: "number"
  },
  async run(input) {
    return searchFlights(input);
  }
});

The syntax will matter later. The idea matters now.

The site exposes intent directly. The agent does not have to infer everything from the current pixels on the screen.

Why this matters for frontend developers

Frontend work has always had this awkward gap between code and behavior.

The code might look correct. The component might compile. The tests might pass. But the real question is still: what happens in the browser?

Does the button work?

Does the dialog trap focus?

Did the console throw an error?

Is the page slow after hydration?

Did that innocent change make the layout shift?

Agents add one more question:

Can an agent understand what this page allows the user to do?

That question is different from “can an agent click the right button?”

Clicking is the low-level action. Understanding intent is the product-level problem.

WebMCP is interesting because it gives frontend developers a possible interface for that intent.

WebMCP versus browser automation MCPs

It is useful to separate WebMCP from tools like Playwright MCP or Chrome DevTools MCP.

Playwright MCP and Chrome DevTools MCP help an agent operate and inspect the browser from outside the page.

For example, an agent can:

Open my local app, go to the signup page, fill the form, and tell me where the flow breaks.

Or:

Run through the checkout flow and check if there are console errors.

That is still very useful. I think these tools are one of the most practical ways frontend engineers can use agents today.

WebMCP is different. It is about the website exposing structured actions from inside the web experience.

A rough comparison:

  • Browser automation MCP: “Agent, here is a browser. Try to use the site.”
  • WebMCP: “Site, tell the agent what actions are available and how to call them.”

Both can exist together.

In fact, the best workflows may use both. The agent can use browser automation to observe and verify the UI, while WebMCP gives it a more reliable path for supported actions.

That feels much healthier than asking agents to guess their way through every interface.

WebMCP does not replace MCP

This is the part Chrome’s second post makes explicit: WebMCP and MCP are not competing ideas.

They solve different problems.

MCP is useful when an agent needs access to capabilities outside a web page. It can connect to databases, internal APIs, GitHub, documentation, local files, or business workflows. The user does not need to have your website open for an MCP server to be useful.

WebMCP is useful when the user is on your website and a browser agent needs to understand what the site can do right now.

The ownership model is different too.

With a normal MCP app, your product may be represented inside the agent’s interface. The agent owns the surrounding UI and your tool is one capability inside it.

With WebMCP, the agent is a guest inside your website. Your UI still exists. Your product still controls the page, the state, the validations, and the safest path for completing the task.

That is why I like thinking about them like this:

MCP: connect the agent to external capabilities.
WebMCP: make the current website understandable and actionable for a browser agent.

In a real product, you might use both.

A travel company could expose an MCP server for account data, bookings, loyalty points, and backend workflows. The website could then use WebMCP for the live booking page, where the agent helps the user search, compare, fill details, and prepare the next action for approval.

One handles durable capabilities. The other handles contextual interaction.

That distinction matters because it prevents us from turning every agent problem into a browser automation problem. Sometimes the agent needs a backend tool. Sometimes it needs the live page. Often, it needs both.

A support-ticket example

Imagine a user asks an agent:

Create a support ticket for the payment issue I just had. Include the browser version and console error if available.

Without structured support from the site, the agent may need to:

  1. Find the help page.
  2. Guess which button opens support.
  3. Fill a form based on labels.
  4. Decide which fields are required.
  5. Submit and hope the flow worked.

With WebMCP-style structured actions, the site could expose something closer to:

createSupportTicket({
  summary,
  description,
  category,
  priority,
  diagnostics
})

The UI can still exist for humans. But the agent gets a clearer path.

This is better for the user, but it is also better for the product team. You are no longer relying on agents reverse-engineering your UI. You define the action boundary yourself.

That boundary is where a lot of product judgment lives.

What fields are allowed? Which actions require confirmation? Which flows should never be automated? Which account state should be visible?

These are not only technical questions. They are product and trust questions.

The permission question

This is the part that deserves more attention.

When a website exposes actions to agents, it is not just adding an API. It is creating a new interaction surface.

That surface needs constraints.

For example:

  • Creating a draft support ticket may be safe.
  • Submitting the ticket may require user confirmation.
  • Searching products may be safe.
  • Completing checkout should almost certainly require confirmation.
  • Reading account metadata may be useful.
  • Changing account settings should be treated carefully.

This is where I think frontend and product engineers need to be involved early.

We have spent years designing UI states, disabled buttons, confirmation dialogs, validation messages, and permission boundaries for humans.

Agent-facing actions need the same level of design.

Maybe more.

Because an agent can move faster than a human, and it can misunderstand with confidence.

A beginner checklist for making pages more agent-ready

WebMCP is still in early preview, so I would not redesign a production app around it tomorrow.

But there are things we can already do that make websites better for both humans and agents.

Use semantic HTML

Prefer real forms, buttons, labels, fieldsets, and native controls where possible.

<label for="email">Email</label>
<input id="email" name="email" type="email" autocomplete="email" required />

<button type="submit">Send reset link</button>

This helps browsers, accessibility tools, tests, and agents.

Make intent visible in the structure

A button that says “Continue” might be visually clear inside a layout, but structurally vague.

Sometimes the fix is simple:

<button type="submit" aria-label="Continue to payment">
  Continue
</button>

Agents are another reason to care about accessible names and clear labels.

Keep actions narrow

If you eventually expose agent actions, make them small and specific.

searchProducts is easier to reason about than handleUserIntent.

createTicketDraft is safer than submitTicketAndNotifySupport.

Good agent interfaces should feel boring. Boring is good when permissions are involved.

Separate draft from commit

This pattern will matter a lot.

Let the agent prepare an action. Let the user approve it before the final commit.

For example:

Agent prepares checkout details -> user reviews -> site completes order

That is a better default than letting the agent jump straight from intent to irreversible action.

What I would try first

If I were experimenting with WebMCP concepts today, I would start with a low-risk flow.

Something like:

  • search a catalog
  • prepare a support ticket draft
  • generate a quote
  • fill a contact form draft
  • filter a list of results
  • collect diagnostics for a bug report

I would avoid starting with payments, account deletion, permission changes, or anything irreversible.

I would also ask one question before adding any agent-facing action:

Does this need to work when the user is not on the website?

If yes, it probably belongs closer to MCP or a backend API. If no, and the value comes from the live page context, WebMCP is the more interesting shape.

The workflow I would want is:

observe -> prepare -> explain -> ask for approval -> act -> verify

That loop is familiar to frontend developers. We already think in terms of UI state, validation, confirmation, and feedback.

WebMCP just makes us ask how those same ideas should work when the user is represented by an agent.

What WebMCP says about the web

The web has always been more than pixels.

HTML carries meaning. Forms carry intent. Links describe navigation. ARIA fills gaps when native semantics are not enough. The browser turns all of that into something users, assistive technologies, search engines, automation tools, and now agents can work with.

WebMCP feels like another step in that direction.

It is not saying the UI does not matter. The UI still matters a lot.

But it does suggest that agent-readable intent may become part of frontend architecture.

That is a healthy direction if we do it carefully.

The worst version of the agentic web is agents scraping, clicking, and guessing their way through interfaces designed only for humans.

The better version is websites exposing small, explicit, permission-aware actions that agents can use on behalf of users.

For frontend developers, that means our job may expand a little.

We will still design screens for humans.

But we may also design action surfaces for agents.

And the same old web fundamentals suddenly matter again: semantics, constraints, names, forms, state, permissions, and trust.

Final thought

I liked the WebMCP announcement because it does not treat agents as magic.

It treats them as another kind of client that needs a contract.

That is a very webby idea.

The browser already sits between users and websites, translating intent into requests, rendering responses, enforcing permissions, and protecting boundaries.

If agents are going to become part of that relationship, I would rather have websites expose clear, narrow, structured actions than leave every agent to poke around the DOM and hope for the best.

For beginners, that is probably the simplest way to understand WebMCP:

MCP connects AI tools to external capabilities. WebMCP explores how websites themselves can expose those capabilities to browser agents.

And for frontend engineers, the practical question is not “will agents replace the UI?”

The better question is:

What should my product make explicit when the user is acting through an agent?

That is where this gets interesting.