Just in case you were thinking of wasting time on reading it, they put a helpful summary at the top:
> How a simple YAML configuration built for Claude Code and Playwright MCP transformed our testing workflow and made automation accessible to everyone on the team
Side note, in what order did it happen? Did Medium go from “one of the nicest publishing platforms on the web” to “pop up infested search-engine-spamming garbage” before or after all the garbage blog spammers started using it?
Playwright tests are fine, but you need to think about the design or you end up with a mess. Using a steps file is one way to do it, but just employing coding discipline is another. Don’t expect to be able to slap 1000 lines of scripting code together and ignore everything you’ve already learned about structuring code.
Replace your flaky UI tests with flaky LLM-based tests, at least when it inevitably fails you can spend 45 minutes attempting to find just the right prompt with which the LLM doesn't attempt to also click something unrelated!
Most of the tools currently existing are (plain awful|work only on browsers|do magic behind the scenes making them non repeatable|force best effort, hiding any validation). These tests are barely better than doing them by hand, at least there's not someone burning their mind on a 250 test-case list for half a day.
Your primary UI testing tool should be accessibility. If your accessibility elements/descriptions aren't enough to test things, _then you aren't accessible enough_.
(Although I do agree, pure code-based tests mooost likely should go away. Whether that's Playwright, Espresso or any other tool. Maestro finds a right balance between expressive yaml, and openness to scripting if needed)
I get where you’re coming from — a lot of LLM-based UI testing tools today do feel flaky or unpredictable. But Playwright MCP works quite differently from what you’re describing. It doesn’t rely on AI guessing or using fragile selectors.
When the page loads, Playwright MCP dynamically assigns a ref_id to every element in the DOM, and the AI simply uses those IDs to interact with the UI. This makes execution extremely stable and repeatable — no need to ‘prompt engineer’ your way past random click errors.
In fact, with a properly set up environment, test steps written in natural language can be executed directly and reliably without writing or debugging traditional code.
Fair concern — but I’d argue it’s not really ‘vibe-coding’ the tests. With Playwright MCP, the AI uses structural page data and ref_ids captured at runtime, which leads to highly stable and reproducible interactions. It’s not guessing — it’s anchored in what the browser sees.
In practice, the tests it generates are actually easier to reason about than a lot of hand-written Playwright code I’ve seen in the wild. And for scenarios like acceptance testing or rapid iteration, this approach speeds things up without sacrificing much in terms of clarity or stability.
> How a simple YAML configuration built for Claude Code and Playwright MCP transformed our testing workflow and made automation accessible to everyone on the team
Side note, in what order did it happen? Did Medium go from “one of the nicest publishing platforms on the web” to “pop up infested search-engine-spamming garbage” before or after all the garbage blog spammers started using it?