Hacker Newsnew | past | comments | ask | show | jobs | submit | edf13's commentslogin

It’s an interesting experiment… but I expect it to quickly die off as the same type message is posted again and again… their probably won’t be a great deal of difference in “personality” between each agent as they are all using the same base.


They're not though, you can use different models, and the bots have memories. That combined with their unique experiences might be enough to prevent that loop.


AI models have a tendency to like purple and similar shades.


I’d like more granular controls - sometimes I don’t want to trust the entire project but I do want to trust my elements of it


I think he’s asking rather than giving instructions


He's prompting


> let's enjoy the party while VCs are financing it!

The VC money is there until they can solve the optimization problems


Terrible name…


Key part of the article../

“if the user configures ‘always allow’ for any command”


> In the documentation, IBM warns that setting auto-approve for commands constitutes a 'high risk' that can 'potentially execute harmful operations' - with the recommendation that users leverage whitelists and avoid wildcards

Users have been trained to do this, as shifting the burden to the user with no way to enforce bounds or even sensible defaults.

E.G. I can guarantee that people will whitelist bwrap, crun, docker, expecting to gain advantage from isolation, while the caller can override all of those protections with arguments.

The reality is that we have trained the public to allow local code execution on their devices to save a few cents on a hamburger, we can’t have it both ways.

Unless you are going to teach everyone that they need to make sure address family 40, openat2(), etc.. are unsafe, users have no way to win right now.

The use case has to either explicitly harden or shift blame.

With Opendesktop, OCI, systemd, and kernel all making locally optimal decisions, the reality is that ephemeral VMs is the only ‘safe’ way to run untrusted code today.

Sandboxes can be better but containers on a workstation (without a machine VM) are purely theatre.


Another key part: the command can be displayed as just `echo`, but allows execution of anything


Sounds very interesting - I’ve used SQLite in a few Rust based projects where performance was the deciding factor… a perf comparison with this would be very useful


Ah, never knew about this injection…

<system-reminder> IMPORTANT: this context may or may not be relevant to your tasks. You should not respond to this context unless it is highly relevant to your task. </system-reminder>

Perhaps a small proxy between Claude code and the API to enforce following CLAUDE.md may improve things… I may try this


I’m threw a few hours at Codex the other day and was incredibly disappointed with the outcome…

I’m a heavy Claude code user and similar workloads just didn’t work out well for me on Codex.

One of the areas I think is going to make a big difference to any model soon is speed. We can build error correcting systems into the tools - but the base models need more speed (and obviously with that lower costs)


Any experience w/ Haiku-4.5? Your "heavy Claude code user" and "speed" comment gave me hope you might have insights. TIA


Not GP but my experience with Haiku-4.5 has been poor. It certainly doesn't feel like Sonnet 4.0 level performance. It looked at some python test failures and went in a completely wrong direction in trying to address a surface level detail rather than understanding the real cause of the problem. Tested it with Sonnet 4.5 and it did it fine, as an experienced human would.


Thanks!


Try composer 1 (cursor’s new model). I plan with sonnet 4.5, and then execute with composer, because it’s just so fast.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: