Hacker Newsnew | past | comments | ask | show | jobs | submit | afro88's commentslogin

I would have agreed with you a year ago

This is what I'm interested in knowing. Just how much can it modify if it's a static binary? Can it modify it's own agents.md etc?

It can modify everything, because while it's a single binary and that makes it easy for installation, there are things stored outside that binary. The memories, the skills, the config etc. But you can do everything from the UI and you don't need to bother, it will be all automatic.

Aside from security and efficiency, is there anything openclaw and do that moltis can't? Like for example, does moltis have the "heartbeat" thing, short and long term memory, can update a soul.md etc?

I'm so keen to try openclaw in a locked down environment but the onboarding docs are a mess and I can see references to the old name in markdowns and stuff like that. Seems like a lot of work just to get up and running.


Moltis can do all that yes, unless I missed something. And it's way easier to setup.

Man, you have to read the article, not just the headline

That would definitely be helpful, but the headline hit a painful spot for me and I went in! You’re right tho! I was in my feelins. I still am. lol

The blur isn't correct though. Like the amount of blur is wrong for the distance, zoom amount etc. So the depth of field is really wrong even if it conforms to "subject crisp, background blurred"

Exactly.

My personal mechanistic understanding of diffusion models is that, "under the hood", the core thing they're doing, at every step and in every layer, is a kind of apophenia — i.e. they recognize patterns/textures they "know" within noise, and then they nudge the noise (least-recognizable pixels) in the image toward the closest of those learned patterns/textures, "snapping" those pixels into high-activation parts of their trained-in texture-space (with any text-prompt input just adding a probabilistic bias toward recognizing/interpreting the noise in certain parts of the image as belonging to certain patterns/textures.)

I like to think of these patterns/textures that diffusion models learn as "brush presets", in the Photoshop sense of the term: a "brush" (i.e. a specific texture or pattern), but locked into a specific size, roughness, intensity, rotation angle, etc.

Due to the way training backpropagation works (and presuming a large-enough training dataset), each of these "brush presets" that a diffusion model learns, will always end up learned as a kind of "archetype" of that brush preset. Out of a collection of examples in the training data where uses of that "brush preset" appear with varying degrees of slightly-wrong-size, slightly-wrong-intensity, slightly-out-of-focus-ness, etc, the model is inevitably going to learn most from the "central examples" in that example cluster, and distill away any parts of the example cluster that are less shared. So whenever a diffusion model recognizes a given one of its known brush presets in an image and snaps pixels toward it, the direction it's moving those pixels will always be toward that archetypal distilled version of that brush preset: the resultant texture in perfect focus, and at a very specific size, intensity, etc.

This also means that diffusion models learn brushes at distinctively-different scales / rotation angles / etc as entirely distinct brush presets. Diffusion models have no way to recognize/repair toward "a size-resampled copy of" one of their learned brush presets. And due to this, diffusion models will never learn to render in details small enough that the high-frequency components of of their recognizable textural-detail would be lost below the Nyquist floor (which is why they suck so much at drawing crowds, tiny letters on signs, etc.) And they will also never learn to recognize or reproduce visual distortions like moire or ringing, that occur when things get rescaled to the point that beat-frequencies appear in their high-frequency components.

Which means that:

- When you instruct a diffusion model that an image should have "low depth-of-field", what you're really telling it is that it should use a "smooth-blur brush preset" to paint in the background details.

- And even if you ask for depth-of-field, everything in what a diffusion model thinks of as the "foreground" of an image will always have this surreal perfect focus, where all the textures are perfectly evident.

- ...and that'll be true, even when it doesn't make sense for the textures to be evident at all, because in real life, at the distance the subject is from the "camera" in the image, the presumed textures would actually be so small as to be lost below the Nyquist floor at anything other than a macro-zoom scale.

These last two problems combine to create an effect that's totally unlike real photography, but is actually (unintentionally) quite similar to how digital artists tend to texture video-game characters for "tactile legibility." Just like how you can clearly see the crisp texture of e.g. denim on Mario's overalls (because the artist wanted to make it feel like you're looking at denim, even though you shouldn't be able to see those kinds of details at the scaling and distance Mario is from the camera), diffusion models will paint anything described as "jeans" or "denim" as having a crisply-evident denim texture, despite that being the totally wrong scale.

It's effectively a "doll clothes" effect — i.e. what you get when you take materials used to make full-scale clothing, cut tiny scraps of those materials to make a much smaller version of that clothing, put them on a doll, and then take pictures far closer to the doll, such that the clothing's material textural detail is visibly far larger relative to the "model" than it should be. Except, instead of just applying to the clothing, it applies to every texture in the scene. You can see the pores on a person's face, and the individual hairs on their head, despite the person standing five feet away from the camera. Nothing is ever aliased down into a visual aggregate texture — until a subject gets distant enough that the recognition maybe snaps over to using entirely different "brush preset" learned specifically on visual aggregate textures.


Right, prompting for depth of field will never work (with current models) because it treats it as a style rather than knowing on some level how light and lenses behave. The model needs to know this, and then we can prompt it with the lens and zoom and it will naturally do the rest. Like how you prompt newer video models without saying "make the ball roll down the hill"

Which LLM am I talking to here, GPT-5.2?

I spent more than an hour writing the above comment, with my own two human hands, spending real thinking time on inventing some (AFAIK) entirely-novel educational metaphors to contribute something unique to the discussion. And you're going to ignore it out-of-hand because, what, you think "long writing" is now something only AIs do?

Kindly look at my commenting history on HN (or on Reddit, same username), where I've been writing with exactly this long and rambling overly-detailed "should have been a blog post" style going on 15+ years now.

Then, once you're convinced that I'm human, maybe you'll take this advice:

A much more useful heuristic for noticing textual AI slop than "it's long and wordy" (or "it contains em-dashes"), is that, no matter how you prompt them, LLMs are constitutionally incapable of writing run-on sentences (like this one!)

Basically every LLM base model at this point, has been RLHFed by feedback from a (not necessarily native-English-speaking, not necessarily highly literate) userbase. And that has pushed the models toward a specific kind of "writing for readability", that aims for a very low lowest-common-denominator writing style... but in terms of grammar/syntax, rather than vocabulary. These base models (or anything fine-tuned from them) will consistently spew out these endless little atomic top-level sentences — one thought per sentence, or sometimes even several itty-bitty sentences per thought (i.e. the "Not x. Not y. Just z." thing) — that can each be digested individually, with no mental state-keeping required.

It's a very inhuman style of writing. No real human being writes like LLMs do, because it doesn't match the way human beings speak or think. (You can edit prose after-the-fact to look like an LLM wrote it, but I dare you to try writing that way on your first draft. It's next to impossible.)

Note how the way LLMs write, is exactly the opposite of the way I write. My writing requires a high level of fluency with English-language grammar and syntax to understand! Which makes it actually rather shitty as general-purpose prose. Luckily I'm writing here on HN for an audience that skews somewhat older and more educated than the general public. But it's still not a style I would subject anyone to if I bothered to spend any time editing what I write after I write it. My writing epitomizes the aphorism "I wrote you a long letter because I didn't have the time to write you a short one." (It's why these are just HN comments in the first place; if I had the time to clean them up, then I'd make them into blog posts!)


Apologies, I did jump the gun here. There has been more and more lazy LLM replies on HN lately and yours raised a flag in my mind because I can't remember someone commenting that deeply while also agreeing with me (normally if it's a lengthy response it's because they are arguing against my point).

There are some enlightening points here about LLM writing style for me. Trying to write like an LLM being impossible (at least for a non-trivial length of text) is such a good point. Run on sentences as another hint that it's not an LLM is also useful. Thanks!


Once you start arguing, it's time to start a new prompt with new instructions

Or, as I prefer, go back in the conversation and edit / add more context so that it wouldn’t go off the wrong track in the first place.

I also like asking the agent how we can update the AGENTS.md to avoid similar mistakes going forward, before starting again.

But gcc is part of it's training data so of course it spit out an autocomplete of a working compiler

/s

This is actually a nice case study in why agentic LLMs do kind of think. It's by no means the same code or compiler. It had to figure out lots and lots of problems along the way to get to the point of tests passing.


> But gcc is part of it's training data so of course it spit out an autocomplete of a working compiler /s

Why the sarcasm tag? It is almost certainly trained on several compiler codebases, plus probably dozens of small "toy" C compilers created as hobby / school projects.

It's an interesting benchmark not because the LLM did something novel, but because it evidently stayed focused and maintained consistency long enough for a project of this complexity.


I think you're being very generous. There's almost 0 chance they had this actually working consistently enough for general use in 2024. Security is also a reason, but there's no security to worry about if it doesn't really work yet anyway


Same. Immediately I thought why not have clawdbot ask you for the 2FA? That way you at least kind of know what security-protected action it's trying to take and can approve it


The problem is baked in - he gives it access to iMessage, which is where all the sms-based 2fac codes end up. There is no way to prevent it reading 2 fac codes if you want to give it full text message access


Would love to know this too. When he talks about letting clawdbot catch promises and appointments in his texts, how many of those get missed? How many get created incorrectly? Absolutely not none. But maybe the numbers work compared to how bad he was at it manually?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: