Hacker Newsnew | past | comments | ask | show | jobs | submit | bglazer's commentslogin

Where were you doing this? Were you ever successful? How did you do it, like what were your tactics? So many questions!

I’ve never heard about modern people doing serious persistence hunting, except for a stunt that I read about years ago. I think it was organized by like Outside or some running publication that got pro marathoners to try and they failed because they didn’t know anything about hunting


Right? Where’s the well written blog post on this I want?

Third. Tell the story!!

I genuinely did not expect to see a robot handling clothing like this within the next ten years at least. Insanely impressive

I do find it interesting that they state that each task is done with a fine tuned model. I wonder if that’s a limitation of the current data set their foundation model is trained on (which is what I think they’re suggesting in the post) or if it reflects something more fundamental about robotics tasks. It does remind me of a few years ago in LLMs when fine tuning was more prevalent. I don’t follow LLM training methodology closely but my impression was that the bulk of recent improvements have come from better RL post training and inference time reasoning.

Obviously they’re pursuing RL and I’m not sure spending more tokens at inference would even help for fine manipulation like this, notwithstanding the latency problems with that.

So, maybe the need for fine tuning goes away with a better foundation model like they’re suggesting? I hope this doesn’t point towards more fundamental limitations on robotics learning with the current VLA foundation model architectures


There's a lot of indications that robotics AI is in a data-starved regime - which means that future models are likely to attain better 0-shot performance, solve more issues in-context, generalize better, require less task-specific training, and be more robust.

But it seems like a degree of "RL in real life" is nigh-inevitable - imitation learning only gets you this far. Kind of like RLVR is nigh-inevitable for high LLM performance on agentic tasks, and for many of the same reasons.


Looks like we may actually have robot maids picking stuff up before too long!

Re. not expecting it for ten years at least, current progress is pretty much in line with Moravec's predictions from 35 years ago. (https://jetpress.org/volume1/moravec.htm)

I wonder if he still follows this stuff?


> robot maids

What fascinates me is we could probably make self-folding clothes. We also already have non wrinkle clothes where folding is minimally needed. I wager we could go a lot further if we invested a tad more into the matter.

But the first image people seem to have of super advanced multi-thousand dollar robots is still folding the laundry.


I think it's just one of the most obvious things that Rosey from the Jetsons could do but current robots can't.

Video of Moravec talking about intelligent robots for 2030: https://youtu.be/4eVv01xOoSo?t=65

To be clear, the video at the top of the article is 4x speed and the clothes folding section is full of cut scenes.

There are other videos of the laundry tasks within the article, and they do not seem to feature cuts if I'm not mistaken.

This is a very tiring criticism. Yes, this is true. But, it's an implementation detail (tokenization) that has very little bearing on the practical utility of these tools. How often are you relying on LLM's to count letters in words?


The implementation detail is that we keep finding them! After this, it couldn't locate a seahorse emoji without freaking out. At some point we need to have a test: there are two drinks before you. One is water, the other is whatever the LLM thought you might like to drink after it completed refactoring the codebase. Choose wisely.


It's an example that shows that if these models aren't trained in a specific problem, they may have a hard time solving it for you.


An analogy is asking someone who is colorblind how many colors are on a sheet of paper. What you are probing isn't reasoning, it's perception. If you can't see the input, you can't reason about the input.


> What you are probing isn't reasoning, it's perception.

Its both. A colorblind person will admit their shortcomings and, if compelled to be helpful like an LLM is, will reason their way to finding a solution that works around their limitations.

But as LLMs lack a way to reason, you get nonsense instead.


What tools does the LLM have access to that would reveal sub-token characters to it?

This assumes the colorblind person both believes it is true that they are colorblind, in a world where that can be verified, and possesses tools to overcome these limitations.

You have to be much more clever to 'see' an atom before the invention of a microscope, if the tool doesn't exist: most of the time you are SOL.


No, it’s an example that shows that LLMs still use a tokenizer, which is not an impediment for almost any task (even many where you would expect it to be, like searching a codebase for variants of a variable name in different cases).


the question remains: is the tokenizer going to be a fundamental limit to my task? how do i know ahead of time?


Would it limit a person getting your instructions in Chinese? Tokenisation pretty much means that the LLM is reading symbols instead of phonemes.

This makes me wonder if LLMs works better in Chinese.


No, it is the issue with the tokenizer.


The criticism would stop if the implementation issue was fixed.

It's an example of a simple task. How often are you relying on LLMs to complete simple tasks?


At this point if I was openAI I wouldn’t bother fixing this to give pedants something to get excited about.


Unless they fixed this in 25 minutes (possible?) it correctly counts 1 `r`.

https://chatgpt.com/share/6941df90-789c-8005-8783-6e1c76cdfc...


He Jiankui is better known for performing the first germ-line (i.e. inheritable by children) genome editing of humans.


> evolution doesn't care about us beyond reproduction age

This isn’t totally true, group/kin selection are important.


What?


Please don’t post chatgpt output


Yudkowsky seems to believe in fast take off, so much so that he suggested bombing data centers. To more directly address your point, I think it’s almost certain that increasing intelligence has diminishing returns and the recursive self improvement loop will be slow. The reason for this is that collecting data is absolutely necessary and many natural processes are both slow and chaotic, meaning that learning from observation and manipulation of them will take years at least. Also lots of resources.

Regarding LLM’s I think METR is a decent metric. However you have to consider the cost of achieving each additional hour or day of task horizon. I’m open to correction here, but I would bet that the cost curves are more exponential than the improvement curves. That would be fundamentally unsustainable and point to a limitation of LLM training/architecture for reasoning and world modeling.

Basically I think the focus on recursive self improvement is not really important in the real world. The actual question is how long and how expensive the learning process is. I think the answer is that it will be long and expensive, just like our current world. No doubt having many more intelligent agents will help speed up parts of the loop but there are physical constraints you can’t get past no matter how smart you are.


How do you reconcile e.g. AlphaGo with the idea that data is a bottleneck?

At some point learning can occur with "self-play", and I believe this is already happening with LLMs to some extent. Then you're not limited by imitating human-made data.

If learning something like software development or mathematical proofs, it is easier to verify whether a solution is correct than to come up with the solution in the first place, many domains are like this. Anything like that is amenable to learning on synthetic data or self-play like AlphaGo did.

I can understand that people who think of LLMs as human-imitation machines, limited to training on human-made data, would think they'd be capped at human-level intelligence. However I don't think that's the case, and we have at least one example of superhuman AI in one domain (Go) showing this.

Regarding cost, I'd have to look into it, but I'm under the impression costs have been up and down over time as models have grown but there have also been efficiency improvements.

I think I'd hazard a guess that end-user costs have not grown exponentially like time horizon capabilities, even though investment in training probably has. Though that's tricky to reason about because training costs are amortised and it's not obvious whether end user costs are at a loss or what profit margin for any given model.

On the fast-slow takeoff - Yud does seem to beleive in a fast takeoff yes, but it's also one of the the oldest disagreements in rationality circles, on which he disagreed with his main co-blogger on the orignal rationalist blog, Overcoming Bias, some discussion of this and more recent disagreements here [1].

[1] https://www.astralcodexten.com/p/yudkowsky-contra-christiano...


AlphaGo showed that RL+search+self play works really well if you have an easy to verify reward and millions of iterations. Math partially falls into this category via automated proof checkers like Lean. So, that’s where I would put the highest likelihood of things getting weird really quickly. It’s worth noting that this hasn’t happened yet, and I’m not sure why. It seems like this recipe should already be yielding results in terms of new mathematics, but it isn’t yet.

That said, nearly every other task in the world is not easily verified, including things we really care about. How do you know if an AI is superhuman at designing fusion reactors? The most important step there is building a fusion reactor.

I think a better reference point than AlphaGo is AlphaFold. Deepmind found some really clever algorithmic improvements, but they didn’t know whether they actually worked until the CASP competition. CASP evaluated their model on new Xray crystal structures of proteins. Needless to say getting Xray protein structures is a difficult and complex process. Also, they trained AlphaFold on thousands of existing structures that were accumulated over decades and required millenia of graduate-student-hours hours to find. It’s worth noting that we have very good theories for all the basic physics underlying protein folding but none of the physics based methods work. We had to rely on painstakingly collected data to learn the emergent phenomena that govern folding. I suspect that this will be the case for many other tasks.


> How do you reconcile e.g. AlphaGo with the idea that data is a bottleneck?

Go is entirely unlike reality in that the rules are fully known and it can be perfectly simulated by a computer. AlphaGo worked because it could run millions of tests in a short time frame, because it is all simulated. It doesn't seem to answer the question of how an AI improves its general intelligence without real-world interaction and data gathering at all. If anything it points to the importance of doing many experiments and gathering data - and this becomes a bottleneck when you can't simply make the experiment run faster, because the experiment is limited by physics.


Here's one: Yudkowsky has been confidently asserting (for years) that AI will extinct humanity because it will learn how to make nanomachines using "strong" covalent bonds rather than the "weak" van der Waals forces used by biological systems like proteins. I'm certain that knowledgeable biologists/physicists have tried to explain to him why this belief is basically nonsense, but he just keeps repeating it. Heck there's even a LessWrong post that lays it out quite well [1]. This points to a general disregard for detailed knowledge of existing things and a preference for "first principles" beliefs, no matter how wrong they are.

[1] https://www.lesswrong.com/posts/8viKzSrYhb6EFk6wg/why-yudkow...


Dear god. The linked article is a good takedown of this "idea," but I would like to pile on: biological systems are in fact extremely good at covalent chemistry, usually via extraordinarily powerful nanomachines called "enzymes". No, they are (usually) not building totally rigid condensed matter structures, but .. why would they? Why would that be better?

I'm reminded of a silly social science article I read, quite a long time ago. It suggested that physicists only like to study condensed matter crystals because physics is a male-dominated field, and crystals are hard rocks, and, um ... men like to think about their rock-hard penises, I guess. Now, this hypothesis obviously does not survive cursory inspection - if we're gendering natural phenomena studied by physicists, are waves male? Are fluid dynamics male?

However, Mr. Yudowsky's weird hangups here around rigidity and hardness have me adjusting my priors.


The article makes very clear that costs are rising for "pet day care" just as quickly as for real day care for children. This can not be explained by regulation, as pet day care is far far less regulated compared to daycare for children.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: