More

edf13 · 2026-01-30T08:06:17 1769760377

It’s an interesting experiment… but I expect it to quickly die off as the same type message is posted again and again… their probably won’t be a great deal of difference in “personality” between each agent as they are all using the same base.

swalsh · 2026-01-30T15:54:39 1769788479

They're not though, you can use different models, and the bots have memories. That combined with their unique experiences might be enough to prevent that loop.

edf13 · 2026-01-30T07:39:35 1769758775

AI models have a tendency to like purple and similar shades.

edf13 · 2026-01-22T19:03:06 1769108586

I’d like more granular controls - sometimes I don’t want to trust the entire project but I do want to trust my elements of it

edf13 · 2026-01-21T08:30:49 1768984249

I think he’s asking rather than giving instructions

pelagicAustral · 2026-01-21T09:00:06 1768986006

He's prompting

edf13 · 2026-01-14T09:46:35 1768383995

> let's enjoy the party while VCs are financing it!

The VC money is there until they can solve the optimization problems

edf13 · 2026-01-11T10:20:31 1768126831

Terrible name…

edf13 · 2026-01-08T19:04:41 1767899081

Key part of the article../

“if the user configures ‘always allow’ for any command”

nyrikki · 2026-01-08T20:42:18 1767904938

> In the documentation, IBM warns that setting auto-approve for commands constitutes a 'high risk' that can 'potentially execute harmful operations' - with the recommendation that users leverage whitelists and avoid wildcards

Users have been trained to do this, as shifting the burden to the user with no way to enforce bounds or even sensible defaults.

E.G. I can guarantee that people will whitelist bwrap, crun, docker, expecting to gain advantage from isolation, while the caller can override all of those protections with arguments.

The reality is that we have trained the public to allow local code execution on their devices to save a few cents on a hamburger, we can’t have it both ways.

Unless you are going to teach everyone that they need to make sure address family 40, openat2(), etc.. are unsafe, users have no way to win right now.

The use case has to either explicitly harden or shift blame.

With Opendesktop, OCI, systemd, and kernel all making locally optimal decisions, the reality is that ephemeral VMs is the only ‘safe’ way to run untrusted code today.

Sandboxes can be better but containers on a workstation (without a machine VM) are purely theatre.

promiseofbeans · 2026-01-08T19:13:27 1767899607

Another key part: the command can be displayed as just `echo`, but allows execution of anything

edf13 · 2025-12-12T07:54:59 1765526099

Sounds very interesting - I’ve used SQLite in a few Rust based projects where performance was the deciding factor… a perf comparison with this would be very useful

edf13 · 2025-12-01T06:40:15 1764571215

Ah, never knew about this injection…

<system-reminder> IMPORTANT: this context may or may not be relevant to your tasks. You should not respond to this context unless it is highly relevant to your task. </system-reminder>

Perhaps a small proxy between Claude code and the API to enforce following CLAUDE.md may improve things… I may try this

edf13 · 2025-11-24T19:49:30 1764013770

I’m threw a few hours at Codex the other day and was incredibly disappointed with the outcome…

I’m a heavy Claude code user and similar workloads just didn’t work out well for me on Codex.

One of the areas I think is going to make a big difference to any model soon is speed. We can build error correcting systems into the tools - but the base models need more speed (and obviously with that lower costs)

chrisweekly · 2025-11-24T20:15:26 1764015326

Any experience w/ Haiku-4.5? Your "heavy Claude code user" and "speed" comment gave me hope you might have insights. TIA

pertymcpert · 2025-11-24T21:38:31 1764020311

Not GP but my experience with Haiku-4.5 has been poor. It certainly doesn't feel like Sonnet 4.0 level performance. It looked at some python test failures and went in a completely wrong direction in trying to address a surface level detail rather than understanding the real cause of the problem. Tested it with Sonnet 4.5 and it did it fine, as an experienced human would.

chrisweekly · 2025-11-25T03:38:44 1764041924

Thanks!

senordevnyc · 2025-11-25T01:26:20 1764033980

Try composer 1 (cursor’s new model). I plan with sonnet 4.5, and then execute with composer, because it’s just so fast.