More

bglazer · 2026-01-01T04:21:29 1767241289

Where were you doing this? Were you ever successful? How did you do it, like what were your tactics? So many questions!

I’ve never heard about modern people doing serious persistence hunting, except for a stunt that I read about years ago. I think it was organized by like Outside or some running publication that got pro marathoners to try and they failed because they didn’t know anything about hunting

conception · 2026-01-01T05:36:13 1767245773

Right? Where’s the well written blog post on this I want?

NoImmatureAdHom · 2026-01-01T16:04:29 1767283469

Third. Tell the story!!

bglazer · 2025-12-27T03:23:13 1766805793

I genuinely did not expect to see a robot handling clothing like this within the next ten years at least. Insanely impressive

I do find it interesting that they state that each task is done with a fine tuned model. I wonder if that’s a limitation of the current data set their foundation model is trained on (which is what I think they’re suggesting in the post) or if it reflects something more fundamental about robotics tasks. It does remind me of a few years ago in LLMs when fine tuning was more prevalent. I don’t follow LLM training methodology closely but my impression was that the bulk of recent improvements have come from better RL post training and inference time reasoning.

Obviously they’re pursuing RL and I’m not sure spending more tokens at inference would even help for fine manipulation like this, notwithstanding the latency problems with that.

So, maybe the need for fine tuning goes away with a better foundation model like they’re suggesting? I hope this doesn’t point towards more fundamental limitations on robotics learning with the current VLA foundation model architectures

ACCount37 · 2025-12-27T10:08:37 1766830117

There's a lot of indications that robotics AI is in a data-starved regime - which means that future models are likely to attain better 0-shot performance, solve more issues in-context, generalize better, require less task-specific training, and be more robust.

But it seems like a degree of "RL in real life" is nigh-inevitable - imitation learning only gets you this far. Kind of like RLVR is nigh-inevitable for high LLM performance on agentic tasks, and for many of the same reasons.

tim333 · 2025-12-27T14:43:45 1766846625

Looks like we may actually have robot maids picking stuff up before too long!

Re. not expecting it for ten years at least, current progress is pretty much in line with Moravec's predictions from 35 years ago. (https://jetpress.org/volume1/moravec.htm)

I wonder if he still follows this stuff?

makeitdouble · 2025-12-27T14:55:09 1766847309

> robot maids

What fascinates me is we could probably make self-folding clothes. We also already have non wrinkle clothes where folding is minimally needed. I wager we could go a lot further if we invested a tad more into the matter.

But the first image people seem to have of super advanced multi-thousand dollar robots is still folding the laundry.

tim333 · 2025-12-27T15:45:36 1766850336

I think it's just one of the most obvious things that Rosey from the Jetsons could do but current robots can't.

tim333 · 2025-12-27T18:22:27 1766859747

Video of Moravec talking about intelligent robots for 2030: https://youtu.be/4eVv01xOoSo?t=65

daveguy · 2025-12-27T14:34:05 1766846045

To be clear, the video at the top of the article is 4x speed and the clothes folding section is full of cut scenes.

v9v · 2025-12-27T19:18:00 1766863080

There are other videos of the laundry tasks within the article, and they do not seem to feature cuts if I'm not mistaken.

bglazer · 2025-12-16T22:13:01 1765923181

This is a very tiring criticism. Yes, this is true. But, it's an implementation detail (tokenization) that has very little bearing on the practical utility of these tools. How often are you relying on LLM's to count letters in words?

1970-01-01 · 2025-12-16T22:53:15 1765925595

The implementation detail is that we keep finding them! After this, it couldn't locate a seahorse emoji without freaking out. At some point we need to have a test: there are two drinks before you. One is water, the other is whatever the LLM thought you might like to drink after it completed refactoring the codebase. Choose wisely.

101008 · 2025-12-16T22:14:54 1765923294

It's an example that shows that if these models aren't trained in a specific problem, they may have a hard time solving it for you.

altruios · 2025-12-16T23:15:19 1765926919

An analogy is asking someone who is colorblind how many colors are on a sheet of paper. What you are probing isn't reasoning, it's perception. If you can't see the input, you can't reason about the input.

9rx · 2025-12-17T04:30:10 1765945810

> What you are probing isn't reasoning, it's perception.

Its both. A colorblind person will admit their shortcomings and, if compelled to be helpful like an LLM is, will reason their way to finding a solution that works around their limitations.

But as LLMs lack a way to reason, you get nonsense instead.

altruios · 2025-12-19T18:15:13 1766168113

What tools does the LLM have access to that would reveal sub-token characters to it?

This assumes the colorblind person both believes it is true that they are colorblind, in a world where that can be verified, and possesses tools to overcome these limitations.

You have to be much more clever to 'see' an atom before the invention of a microscope, if the tool doesn't exist: most of the time you are SOL.

Uehreka · 2025-12-16T22:32:02 1765924322

No, it’s an example that shows that LLMs still use a tokenizer, which is not an impediment for almost any task (even many where you would expect it to be, like searching a codebase for variants of a variable name in different cases).

8note · 2025-12-16T22:55:03 1765925703

the question remains: is the tokenizer going to be a fundamental limit to my task? how do i know ahead of time?

worldsayshi · 2025-12-16T23:09:59 1765926599

Would it limit a person getting your instructions in Chinese? Tokenisation pretty much means that the LLM is reading symbols instead of phonemes.

This makes me wonder if LLMs works better in Chinese.

victorbjorklund · 2025-12-16T22:55:05 1765925705

No, it is the issue with the tokenizer.

iAMkenough · 2025-12-16T22:34:21 1765924461

The criticism would stop if the implementation issue was fixed.

It's an example of a simple task. How often are you relying on LLMs to complete simple tasks?

andy99 · 2025-12-16T22:33:58 1765924438

At this point if I was openAI I wouldn’t bother fixing this to give pedants something to get excited about.

properbrew · 2025-12-16T22:39:54 1765924794

Unless they fixed this in 25 minutes (possible?) it correctly counts 1 `r`.

https://chatgpt.com/share/6941df90-789c-8005-8783-6e1c76cdfc...

bglazer · 2025-11-08T00:56:28 1762563388

He Jiankui is better known for performing the first germ-line (i.e. inheritable by children) genome editing of humans.

bglazer · 2025-10-04T15:14:18 1759590858

> evolution doesn't care about us beyond reproduction age

This isn’t totally true, group/kin selection are important.

bglazer · 2025-09-20T22:55:10 1758408910

What?

bglazer · 2025-09-03T15:45:09 1756914309

Please don’t post chatgpt output

bglazer · 2025-08-13T04:26:19 1755059179

Yudkowsky seems to believe in fast take off, so much so that he suggested bombing data centers. To more directly address your point, I think it’s almost certain that increasing intelligence has diminishing returns and the recursive self improvement loop will be slow. The reason for this is that collecting data is absolutely necessary and many natural processes are both slow and chaotic, meaning that learning from observation and manipulation of them will take years at least. Also lots of resources.

Regarding LLM’s I think METR is a decent metric. However you have to consider the cost of achieving each additional hour or day of task horizon. I’m open to correction here, but I would bet that the cost curves are more exponential than the improvement curves. That would be fundamentally unsustainable and point to a limitation of LLM training/architecture for reasoning and world modeling.

Basically I think the focus on recursive self improvement is not really important in the real world. The actual question is how long and how expensive the learning process is. I think the answer is that it will be long and expensive, just like our current world. No doubt having many more intelligent agents will help speed up parts of the loop but there are physical constraints you can’t get past no matter how smart you are.

doubleunplussed · 2025-08-13T04:34:16 1755059656

How do you reconcile e.g. AlphaGo with the idea that data is a bottleneck?

At some point learning can occur with "self-play", and I believe this is already happening with LLMs to some extent. Then you're not limited by imitating human-made data.

If learning something like software development or mathematical proofs, it is easier to verify whether a solution is correct than to come up with the solution in the first place, many domains are like this. Anything like that is amenable to learning on synthetic data or self-play like AlphaGo did.

I can understand that people who think of LLMs as human-imitation machines, limited to training on human-made data, would think they'd be capped at human-level intelligence. However I don't think that's the case, and we have at least one example of superhuman AI in one domain (Go) showing this.

Regarding cost, I'd have to look into it, but I'm under the impression costs have been up and down over time as models have grown but there have also been efficiency improvements.

I think I'd hazard a guess that end-user costs have not grown exponentially like time horizon capabilities, even though investment in training probably has. Though that's tricky to reason about because training costs are amortised and it's not obvious whether end user costs are at a loss or what profit margin for any given model.

On the fast-slow takeoff - Yud does seem to beleive in a fast takeoff yes, but it's also one of the the oldest disagreements in rationality circles, on which he disagreed with his main co-blogger on the orignal rationalist blog, Overcoming Bias, some discussion of this and more recent disagreements here [1].

[1] https://www.astralcodexten.com/p/yudkowsky-contra-christiano...

bglazer · 2025-08-13T13:40:32 1755092432

AlphaGo showed that RL+search+self play works really well if you have an easy to verify reward and millions of iterations. Math partially falls into this category via automated proof checkers like Lean. So, that’s where I would put the highest likelihood of things getting weird really quickly. It’s worth noting that this hasn’t happened yet, and I’m not sure why. It seems like this recipe should already be yielding results in terms of new mathematics, but it isn’t yet.

That said, nearly every other task in the world is not easily verified, including things we really care about. How do you know if an AI is superhuman at designing fusion reactors? The most important step there is building a fusion reactor.

I think a better reference point than AlphaGo is AlphaFold. Deepmind found some really clever algorithmic improvements, but they didn’t know whether they actually worked until the CASP competition. CASP evaluated their model on new Xray crystal structures of proteins. Needless to say getting Xray protein structures is a difficult and complex process. Also, they trained AlphaFold on thousands of existing structures that were accumulated over decades and required millenia of graduate-student-hours hours to find. It’s worth noting that we have very good theories for all the basic physics underlying protein folding but none of the physics based methods work. We had to rely on painstakingly collected data to learn the emergent phenomena that govern folding. I suspect that this will be the case for many other tasks.

Vegenoid · 2025-08-13T18:10:33 1755108633

> How do you reconcile e.g. AlphaGo with the idea that data is a bottleneck?

Go is entirely unlike reality in that the rules are fully known and it can be perfectly simulated by a computer. AlphaGo worked because it could run millions of tests in a short time frame, because it is all simulated. It doesn't seem to answer the question of how an AI improves its general intelligence without real-world interaction and data gathering at all. If anything it points to the importance of doing many experiments and gathering data - and this becomes a bottleneck when you can't simply make the experiment run faster, because the experiment is limited by physics.

bglazer · 2025-08-12T16:36:55 1755016615

Here's one: Yudkowsky has been confidently asserting (for years) that AI will extinct humanity because it will learn how to make nanomachines using "strong" covalent bonds rather than the "weak" van der Waals forces used by biological systems like proteins. I'm certain that knowledgeable biologists/physicists have tried to explain to him why this belief is basically nonsense, but he just keeps repeating it. Heck there's even a LessWrong post that lays it out quite well [1]. This points to a general disregard for detailed knowledge of existing things and a preference for "first principles" beliefs, no matter how wrong they are.

[1] https://www.lesswrong.com/posts/8viKzSrYhb6EFk6wg/why-yudkow...

12_throw_away · 2025-08-13T00:35:34 1755045334

Dear god. The linked article is a good takedown of this "idea," but I would like to pile on: biological systems are in fact extremely good at covalent chemistry, usually via extraordinarily powerful nanomachines called "enzymes". No, they are (usually) not building totally rigid condensed matter structures, but .. why would they? Why would that be better?

I'm reminded of a silly social science article I read, quite a long time ago. It suggested that physicists only like to study condensed matter crystals because physics is a male-dominated field, and crystals are hard rocks, and, um ... men like to think about their rock-hard penises, I guess. Now, this hypothesis obviously does not survive cursory inspection - if we're gendering natural phenomena studied by physicists, are waves male? Are fluid dynamics male?

However, Mr. Yudowsky's weird hangups here around rigidity and hardness have me adjusting my priors.

bglazer · 2025-07-30T14:56:19 1753887379

The article makes very clear that costs are rising for "pet day care" just as quickly as for real day care for children. This can not be explained by regulation, as pet day care is far far less regulated compared to daycare for children.