the Claude --chrome command has a few limitations:
1. it exposes low-level tools which make your agent interact directly with the browser which is extremely slow, VERY expensive, and less effective as the agent ends up dealing with UI mechanics instead of thinking about the higher-level goal/intents
2. it makes Claude operate the browser via screenshots and coordinates-based interaction, which does not work for tasks like data extraction where it needs to be able to attend to the whole page - the agent needs to repeatedly scroll and read one little screenshot at the time and it often misses critical context outside of the viewport. It also makes the task more difficult as the model has to figure out both what to do and how to do it, which means that you need to use larger models to make this paradigm actually work
3. because it uses your local browser, it also means that it has full access to your authenticated accounts by default which might not be ideal in a world where prompt-injections are only getting started
if you actively use the --chrome command we'd love to hear your experience!
I am sure they measured the difference but i am wondering why reading screenshots + coordinates is more efficient than selecting aria labels? https://github.com/Mic92/mics-skills/blob/main/skills/browse.... the JavaScript snippets should at least more reusable if you want semi-automate websites with memory files
At least to a level that gets you way past HTTP Bearer Token Authentication where the humans are upvoting and shilling crypto with no AI in sight (like on Moltbook at the moment).
Claude generated the statements to run against Supabase and the person getting the statements from Claude sent it to the person who vibe-coded Moltbook.
I wish I was kidding but not really - they posted about it on X.
The tools or the models? It's getting absurdly confusing.
"Claude Code" is an interface to Claude, Cursor is an IDE (I think?! VS Code fork?), GitHub Copilot is a CLI or VS Code plugin to use with ... Claude, or GPT models, or ...
If they are using "Claude Code" that means they are using Anthropic's models - which is interesting given their huge investment in OpenAI.
But this is getting silly. People think "CoPilot" is "Microsoft's AI" which it isn't. They have OpenAI on Azure. Does Microsoft even have a fine-tuned GPT model or are they just prompting an OpenAI model for their Windows-builtins?
When you say you use CoPilot with Claude Opus people get confused. But this is what I do everyday at work.
When I work with AI on large, tricky code bases I try to do a collaboration where it hands off things to me that may result in large number of tokens (excess tool calls, unprecise searches, verbose output, reading large files without a range specified, etc.).
This will help narrow down exactly which to still handle manually to best keep within token budgets.
Note: "yourusername" in install git clone instructions should be replaced.
I've been trying to get token usage down by instructing Claude to stop being so verbose (saying what it's going to do beforehand, saying what it just did, spitting out pointless file trees) but it ignores my instructions. It could be that the model is just hard to steer away from doing that... or Anthropic want it to waste tokens so you burn through your usage quickly.
Hahahah just fixed it, thank you so much!!!! Think of extending this to a prompt admin, Im sure there is a lot of trash that the system sends on every query, I think we can improve this.
I'm always surprised that Python doesn't have as good TUI libraries as Javascript or Rust. With the amount of CLI tooling written in Python, you'd think it had better libraries than any other language.
One reason for the lack of python might be the timing of the TUI renaissance, which I think happened (is happening?) alongside the rise of languages like Go and Rust.
They also probably mean TUIs, as CLIs don't do the whole "Draw every X" thing (and usually aren't interactive), that's basically what sets them apart from CLIs.
It’s surprising how quickly the bottleneck starts to become python itself in any nontrivial application, unless you’re very careful to write a thin layer that mostly shells out to C modules.
Textual is A++. Feels a bit less snappy than Ink, but it makes up in all things with its immense feature-set. Seriously fun building apps of all kinds with this lib.
They started with Ink but have since switched to their own renderer:
> We originally built Claude Code on Ink, a React renderer for the terminal. [...] Over the past few months, we've rewritten our rendering system from scratch (while still using React).
React is just an abstraction of a State -> View function.
While not universally applicable, it's very convenient during development to focus on State without thinking about View, or focus on View without thinking about State.
The concept itself has nothing to do with the actual renderer: HTML, TUI, or whatever. You can render your state to a text file if you want to.
So the flickering is caused either by a faulty renderer, or by using a render target (terminal) that is incompatible with the UI behavior (frequent partial re-renders, outputting a lot of text etc.)
Thats the problem. Some developers want to avoid learning another programming language and use one for everything (including their technologies.)
Using TS, React here doesn’t make sense for stability in the long term. As you can see, even when they replaced Ink and built their own, the problem still exists.
There are other alternatives that are better than whatever Anthropic did such as Bubbletea (Go) or Ratatui (Rust) which both are better suited for this.
Maybe they were thinking more about job security with TypeScript over technical correctness and a robust implementation architecture and this shows the lack of it.
I’m a fan of Bubbletea, but it is significantly less ergonomic than React. Although I’d argue that if that starts to matter significantly, your TUI is probably too cluttered anyway and you should pare it down.
FWIW, Ink is working on an incremental rendering system: they have a flag to enable it. It's currently pretty buggy though unfortunately. Definitely wish Anthropic would commit some resources back to the project they're built on to help fix it...
> Claude is definitely not taking screenshots of that desktop & organizing, it's using normal file management cli tools
Are you sure about that?
Try "claude --chrome" with the CLI tool and watch what it does in the web browser.
It takes screenshots all the time to feed back into the multimodal vision and help it navigate.
It can look at the HTML or the JavaScript but Claude seems to find it "easier" to take a screenshot to find out what exactly is on the screen. Not parse the DOM.
So I don't know how Cowork does this, but there is no reason it couldn't be doing the same thing.
I wonder if there's something to be said about screenshots preventing context poisoning vs parsing. Or in other words, the "poison" would have to be visible and obvious on the page where as it could be easily hidden in the DOM.
And I do know there are ways to hide data like watermarks in images but I do not know if that would be able to poison an AI.
Considering that very subtle not-human-visible tweaks can make vision models misclassify inputs, it seems very plausible that you can include non-human-visible content the model consumes.
I've been a software engineer professionally for over two decades and I use AI heavily both for personal projects and at work.
At work the projects are huge (200+ large projects in various languages, C#, TypeScript front-end libs, Python, Redis, AWS, Azure, SQL, all sorts of things).
AI can go into huge codebases perfectly fine and get a root cause + fix in minutes - you just need to know how to use it properly.
Personally I do "recon" before I send it off into the field by creating a markdown document explaining the issue, the files involved, and any "gotchas" it may encounter.
It's exactly the same as I would do with another senior software engineer. They need that information to figure out what is going on.
And with that? They will hand you back a markdown document with a Root Cause Analysis, identify potential fixes, and explain why.
It works amazingly well if you work with it as a peer.
reply