7 min read

WebMCP's open question, answered from accessibility

On June 11, 2026, WebKit formally opposed WebMCP, the proposal that lets a web page expose its actions to AI agents as callable tools. The positions here are moving week to week, so treat all of this as a snapshot from mid-June. The opposition is worth reading in full, but the more interesting thing is what happened next. Instead of retreating, the proposal's author at Google asked WebKit two questions: if WebMCP consisted only of its declarative API, would that satisfy you? And short of ripping the imperative API out, what changes would make it acceptable?

Those questions are still open. Mozilla, for its part, isn't lined up with WebKit: a reviewer there has proposed marking WebMCP neutral, willing to revisit once there's evidence of how sites actually use it. So the picture isn't a wall of opposition. It's one engine opposed, one waiting for evidence, and the two that wrote it shipping it. Here's an answer from the accessibility side, because accessibility is where the answer actually lives.

First, what's already settled, so I'm not pretending to discover it. WebKit's central worry is that a parallel, agent-facing tool layer won't even deliver the reliability it promises, since the agent still picks a tool from an ambiguous natural-language description, and the spec itself concedes there's no guarantee a tool's declared intent matches its actual behavior. That concession is the whole problem in one line, and it's in the spec, not something I'm bringing to it. WebKit's other point is architectural and I agree with it: an agent acting for a user is in effect assistive technology, and the site shouldn't hand it a richer interface than it hands the human. Start from there.

Before answering, the reason any of this is worth getting right. For some people an interface you address by intent, rather than by operating its controls one at a time, isn't a convenience, it's access the control-by-control web never delivered. Léonie Watson, who chairs the W3C Board and uses a screen reader, has described an agentic assistant that lets her search, ask about a product, narrow by feature, and add to her basket by voice, close to the only way independent clothes shopping has been open to her. That benefit is real, and it's why the answer can't be "just oppose it." It has to be a way to keep that benefit without letting the human interface rot underneath it.

Now the first question: would declarative-only be enough?

The declarative API annotates an existing HTML form, and the browser builds the tool's schema from the form's own controls. It can't drift from the page, because it is the page. That property is exactly what WebKit wants, and it's why declarative-only is close to right. But it isn't quite the line, and the reason is worth stating, because it's the part the shared-layers argument tends to skip.

A well-built semantic page lets an agent infer a great deal. Proper roles, names, states, and relationships give it real structure to reason over, and the better the semantics, the more it can infer. But there's a ceiling. The accessibility tree represents controls, not tasks. It can tell an agent there's an origin field, a date picker, and a passenger stepper; it can't represent "book Berlin to Amsterdam in August for one adult and two infants" as a single intent, because that intent never lives in the page. Semantics get the agent most of the way up the curve. The task-level residue at the top is the part no amount of control markup expresses, and it's the part a tool layer can legitimately carry. So declarative-only is the right instinct but the wrong cutoff. The question isn't declarative versus imperative. It's whether the tool, whichever kind, stays tied to a real human interface.

Which answers the second question: what would make the imperative API acceptable, short of removing it?

Google's own framing is the answer, if the spec would commit to it. In that same thread, the argument for keeping imperative tools is that they'll be thin wrappers over functionality the site already implements, making existing behaviour reachable by an agent. If that were guaranteed, I'd have little to argue with. But it isn't guaranteed, and the spec's own canonical example breaks it. The documented addToCart tool calls a backend endpoint directly and never touches the page. It works perfectly for the agent. And the person who operates that site with a keyboard or a screen reader is left exactly where they were, because if the add-to-cart button was inaccessible before, it still is. The tool didn't fix it; it routed around it, and removed the signal that anything was broken, since the task now succeeds for the agent while failing for the human.

That's the failure mode worth designing against, and the dangerous version isn't the missing button, it's the surviving one. Once the agent path carries the traffic, nobody has a reason to keep the human surface accessible, so its labels drift and its semantics rot while a passing tool hides the decay. We've watched this before, with separate "accessible versions" that withered the moment they stopped being the main thing.

Mozilla's reviewer left one question explicitly open: whether this divergence between tool and page is made worse by the API or is just inherent to language models driving a browser. I think the API decides it, because the API sets the price. Right now the easiest, most-documented way to be agent-ready is the tool that bypasses the UI, which prices good semantics as optional. So the change I'd ask for isn't a prohibition but a repricing: make the cheap path the accessible one. An imperative tool should resolve to real, accessibility-exposed controls, so that the easiest way to write one is also the way that improves the human interface, and a tool with no human counterpart becomes the awkward thing to write rather than the obvious one. Read that way, the bypass tool is usually a symptom: the tool floats free because the underlying interface was too poor to attach to. Fix the semantics and the agent infers more on its own and the remaining tools can be declarative. This isn't foreign to the spec, which already lets a tool declare its relationship to side effects through readOnly and toolautosubmit. Declaring its relationship to the human interface is the same kind of statement.

This also isn't an outside demand. W3C's Priority of Constituencies puts user needs before authors, authors before implementors. Agents aren't on that list, but wherever they'd sit, it's below users. An API whose cheapest path serves the agent while the human surface decays has the order backwards.

I'll be honest about the limit. A rule that a tool resolve to an exposed control can check that the control is present, not that it's good; presence is enforceable, quality isn't. So this is a floor, and floors get gamed. The thing that actually holds quality over time isn't the rule, it's the incentive underneath it: when the agent capability rides on the interface being good, the interface gets maintained, because it's load-bearing for something the author wants. None of it makes the agent path itself accessible, or helps the person who can't or won't use an agent. It keeps the human interface required. That's the floor worth standing on, and it's the one the present design removes by default.

The conversation between Google and WebKit is happening right now, and the question on the table is a good one. The answer, from where I sit, is that the line isn't declarative versus imperative. It's whether the tool the agent calls is built on the interface a person uses, or instead of it.