Hacker Newsnew | past | comments | ask | show | jobs | submit | EMM_386's commentslogin

What is wrong with "claude --chrome"?

the Claude --chrome command has a few limitations:

1. it exposes low-level tools which make your agent interact directly with the browser which is extremely slow, VERY expensive, and less effective as the agent ends up dealing with UI mechanics instead of thinking about the higher-level goal/intents

2. it makes Claude operate the browser via screenshots and coordinates-based interaction, which does not work for tasks like data extraction where it needs to be able to attend to the whole page - the agent needs to repeatedly scroll and read one little screenshot at the time and it often misses critical context outside of the viewport. It also makes the task more difficult as the model has to figure out both what to do and how to do it, which means that you need to use larger models to make this paradigm actually work

3. because it uses your local browser, it also means that it has full access to your authenticated accounts by default which might not be ideal in a world where prompt-injections are only getting started

if you actively use the --chrome command we'd love to hear your experience!


I am sure they measured the difference but i am wondering why reading screenshots + coordinates is more efficient than selecting aria labels? https://github.com/Mic92/mics-skills/blob/main/skills/browse.... the JavaScript snippets should at least more reusable if you want semi-automate websites with memory files

claude --chrome works, but as the OP mentions: they can do it 20x faster, by passing in higher-level commands.

You can do this.

At least to a level that gets you way past HTTP Bearer Token Authentication where the humans are upvoting and shilling crypto with no AI in sight (like on Moltbook at the moment).


Claude generated the statements to run against Supabase and the person getting the statements from Claude sent it to the person who vibe-coded Moltbook.

I wish I was kidding but not really - they posted about it on X.


Claude is very good at writing SQL. You still need to review and understand it.

I recently started a new Supabase project and used Claude to write all migrations related to RLS and RBAC.


What are we discussing here?

The tools or the models? It's getting absurdly confusing.

"Claude Code" is an interface to Claude, Cursor is an IDE (I think?! VS Code fork?), GitHub Copilot is a CLI or VS Code plugin to use with ... Claude, or GPT models, or ...

If they are using "Claude Code" that means they are using Anthropic's models - which is interesting given their huge investment in OpenAI.

But this is getting silly. People think "CoPilot" is "Microsoft's AI" which it isn't. They have OpenAI on Azure. Does Microsoft even have a fine-tuned GPT model or are they just prompting an OpenAI model for their Windows-builtins?

When you say you use CoPilot with Claude Opus people get confused. But this is what I do everyday at work.

shrug


This is great.

When I work with AI on large, tricky code bases I try to do a collaboration where it hands off things to me that may result in large number of tokens (excess tool calls, unprecise searches, verbose output, reading large files without a range specified, etc.).

This will help narrow down exactly which to still handle manually to best keep within token budgets.

Note: "yourusername" in install git clone instructions should be replaced.


I've been trying to get token usage down by instructing Claude to stop being so verbose (saying what it's going to do beforehand, saying what it just did, spitting out pointless file trees) but it ignores my instructions. It could be that the model is just hard to steer away from doing that... or Anthropic want it to waste tokens so you burn through your usage quickly.

Simply assert that :

you are a professional (insert concise occupation).

Be terse.

Skip the summary.

Give me the nitty-gritty details.

You can send all that using your AI client settings.


I had a similar problem, and when claude code (or codex) is running in sandbox, i wanted to put a cap or get notified on large contexts.

especially, because once x0K words crossed, the output becomes worser.

https://github.com/quilrai/LLMWatcher

made this mac app for the same purpose. any thoughts would be appreciated


Would you mind sharing more details about how you do this? What do you add to your AI prompts to make it hand those tasks off to you?

Hahahah just fixed it, thank you so much!!!! Think of extending this to a prompt admin, Im sure there is a lot of trash that the system sends on every query, I think we can improve this.

The flickering issue due to the Ink library has been a headache for a long time, but they are slowly making progress on this.

https://github.com/anthropics/claude-code/issues/769


The problem is they are using the Ink library which clears and redraws for each update.

https://github.com/anthropics/claude-code/issues/769

I locally patched the closed-source CLI npm package but it's not perfect. They would have to switch how their TUI is rendered on their side.

Apparently OpenAI Codex is rust+ratatui which does not have this issue.


I'm always surprised that Python doesn't have as good TUI libraries as Javascript or Rust. With the amount of CLI tooling written in Python, you'd think it had better libraries than any other language.


Blessed was a decent one iirc:

https://github.com/jquast/blessed

One reason for the lack of python might be the timing of the TUI renaissance, which I think happened (is happening?) alongside the rise of languages like Go and Rust.


it has, but python being single threaded (until recently) didn't make it an attractive choice for CLI tools.

example: `ranger` is written in python and it's freaking slow. in comparison, `yazi` (Rust) has been a breeze.

Edit: Sorry, I meant GIL, not single thread.


> it has, but python being single threaded (until recently) didn't make it an attractive choice for CLI tools.

You probably mean GIL, as python has supported multi threading for like 20 years.

Idk if ranger is slow because it is written in python. Probably it is the specific implementation.


> You probably mean GIL

They also probably mean TUIs, as CLIs don't do the whole "Draw every X" thing (and usually aren't interactive), that's basically what sets them apart from CLIs.


Even my CC status line script enjoyed a 20x speed improvement when I rewrote it from python to rust.


It’s surprising how quickly the bottleneck starts to become python itself in any nontrivial application, unless you’re very careful to write a thin layer that mostly shells out to C modules.


Textual looks really nice, but I usually make web apps so I haven’t tried it for anything serious:

https://textual.textualize.io/


Textual is cook, but it's maintained by a single guy, and the roadmap hasn't been updated since 2023, https://textual.textualize.io/roadmap/


Textual is A++. Feels a bit less snappy than Ink, but it makes up in all things with its immense feature-set. Seriously fun building apps of all kinds with this lib.


I’m using Textual for my TUI needs, it’s very decent.


They started with Ink but have since switched to their own renderer:

> We originally built Claude Code on Ink, a React renderer for the terminal. [...] Over the past few months, we've rewritten our rendering system from scratch (while still using React).

https://github.com/anthropics/claude-code/issues/769#issueco...


Thanks for sharing. Very … interesting. Just trying to understand why the heck would React be the best tool here?


React is just an abstraction of a State -> View function.

While not universally applicable, it's very convenient during development to focus on State without thinking about View, or focus on View without thinking about State.

The concept itself has nothing to do with the actual renderer: HTML, TUI, or whatever. You can render your state to a text file if you want to.

So the flickering is caused either by a faulty renderer, or by using a render target (terminal) that is incompatible with the UI behavior (frequent partial re-renders, outputting a lot of text etc.)


Thats the problem. Some developers want to avoid learning another programming language and use one for everything (including their technologies.)

Using TS, React here doesn’t make sense for stability in the long term. As you can see, even when they replaced Ink and built their own, the problem still exists.

There are other alternatives that are better than whatever Anthropic did such as Bubbletea (Go) or Ratatui (Rust) which both are better suited for this.

Maybe they were thinking more about job security with TypeScript over technical correctness and a robust implementation architecture and this shows the lack of it.


I’m a fan of Bubbletea, but it is significantly less ergonomic than React. Although I’d argue that if that starts to matter significantly, your TUI is probably too cluttered anyway and you should pare it down.


I genuinely thought this was satire until I looked it up. I guess it's just to make us webdevs feel at home in the Terminal (ooh, spooky!)


React separates into layers.

Any web react project out there will install react AND react-dom, which is the son implementation of react.

It’s how react translates into mobile, web, etc so well.

It defines contracts and packages like react-dom handle th specific implementation.


Building a react renderer has long been on my wish list of weekend (>1 weekend most likely) projects.


gemini-cli, opencode are also react based


Opencode uses SolidJS:

    We moved from the go+bubbletea based TUI which had performance and capability issues to an in-house framework (OpenTUI) written in zig+solidjs
https://opencode.ai/docs/1-0/


React?!

Good grief get me off this sloppy ride.


then maybe they should've bought and fixed Ink instead of bun, just saying!


FWIW, Ink is working on an incremental rendering system: they have a flag to enable it. It's currently pretty buggy though unfortunately. Definitely wish Anthropic would commit some resources back to the project they're built on to help fix it...


> Claude is definitely not taking screenshots of that desktop & organizing, it's using normal file management cli tools

Are you sure about that?

Try "claude --chrome" with the CLI tool and watch what it does in the web browser.

It takes screenshots all the time to feed back into the multimodal vision and help it navigate.

It can look at the HTML or the JavaScript but Claude seems to find it "easier" to take a screenshot to find out what exactly is on the screen. Not parse the DOM.

So I don't know how Cowork does this, but there is no reason it couldn't be doing the same thing.


I wonder if there's something to be said about screenshots preventing context poisoning vs parsing. Or in other words, the "poison" would have to be visible and obvious on the page where as it could be easily hidden in the DOM.

And I do know there are ways to hide data like watermarks in images but I do not know if that would be able to poison an AI.


Considering that very subtle not-human-visible tweaks can make vision models misclassify inputs, it seems very plausible that you can include non-human-visible content the model consumes.

https://cacm.acm.org/news/when-images-fool-ai-models/

https://arxiv.org/abs/2306.13213


I've been a software engineer professionally for over two decades and I use AI heavily both for personal projects and at work.

At work the projects are huge (200+ large projects in various languages, C#, TypeScript front-end libs, Python, Redis, AWS, Azure, SQL, all sorts of things).

AI can go into huge codebases perfectly fine and get a root cause + fix in minutes - you just need to know how to use it properly.

Personally I do "recon" before I send it off into the field by creating a markdown document explaining the issue, the files involved, and any "gotchas" it may encounter.

It's exactly the same as I would do with another senior software engineer. They need that information to figure out what is going on.

And with that? They will hand you back a markdown document with a Root Cause Analysis, identify potential fixes, and explain why.

It works amazingly well if you work with it as a peer.


PrimeTek components (PrimeReact, PrimeNG) are MIT licensed open source.

They also have a CSS utility library (like Tailwind).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: