I’m not sure we can accept the premise that LLMs haven’t made any breakthroughs. What if people aren’t giving the LLM credit when they get a breakthrough from it?
First time I got good code out of a model, I told my friends and coworkers about it. Not anymore. The way I see it, the model is a service I (or my employer) pays for. Everyone knows it’s a tool that I can use, and nobody expects me to apportion credit for whether specific ideas came from the model or me. I tell people I code with LLMs, but I don’t commit a comment saying “wow, this clever bit came from the model!”
If people are getting actual bombshell breakthroughs from LLMs, maybe they are rationally deciding to use those ideas without mentioning the LLM came up with it first.
Anyway, I still think Gwern’s suggestion of a generic idea-lab trying to churn out insights is neat. Given the resources needed to fund such an effort, I could imagine that a trading shop would be a possible place to develop such a system. Instead of looking for insights generally, you’d be looking for profitable trades. Also, I think you’d do a lot better if you have relevant experts to evaluate the promising ideas, which means that more focused efforts would be more manageable. Not comparing everything to everything, but comparing everything to stuff in the expert’s domain.
If a system like that already exists at Jane Street or something, I doubt they are going to tell us about it.
It is hard to accept as a premise because the premise is questionable from the beginning.
Google already reported several breakthroughs as a direct result of AI, using processes that almost certainly include LLMs, including a new solution in math, improved chip designs, etc. DeepMind has AI that predicted millions of protein folds which are already being used in drugs among many other things they do, though yes, not an LLM per se. There is certainly the probability that companies won’t announce things given that the direct LLM output isn’t copyrightable/patentable, so a human-in-the-loop solves the issue by claiming the human made said breakthrough with AI/LLM assistance. There isn’t much benefit to announcing how much AI helped with a breakthrough unless you’re engaged in basically selling AI.
As for “why aren’t LLMs creating breakthroughs by themselves regularly”, that answer is pretty obvious… they just don’t really have that capacity in a meaningful way based on how they work. The closest example is Google’s algorithmic breakthrough absolutely was created by a coding LLM, which was effectively achieved through brute force in a well established domain, but that doesn’t mean it wasn’t a breakthrough. That alone casts doubt on the underlying premise of the post.
The same is true of humanity in aggregate. We attribute discoveries to an individual or group of researchers but to claim humans are efficient at novel research is a form of survivorship bias. We ignore the numerous researchers who failed to achieve the same discoveries.
The fact some people don't succeed doesn't show that humans operate by brute force. To claim humans reason and invent by brute force is patently absurd.
Does “brute force” allow for heuristics and direction?
If it doesn’t (“brute” as opposite of “smart”, just dumb iteration to exhaustion) then you’re right, of course.
But if it does, then I’m not sure it’s patently absurd - novel ideas could be merely a matter of chance of having all the precursors together at the right time, a stochastic process. And it scales well, bearing at least some resemblance to brute force approaches - although the term is not entirely great (something around “stochastic”, “trial-and-error”, and “heuristic” is probably a better term).
It’s an absurd statement because you are human and are aware of how research works on an individual level.
Take yourself outside of that, and imagine you invented earth, added an ecosystem, and some humans. Wheels were invented ~6k years ago, and “humans” have existed for ~40-300k years. We can do the same for other technologies. As a group, we are incredibly inefficient, and an outside observer would see our efforts at building societies and failing to be “brute force”
I consider humans an "intelligent" species in the sense that a critical mass of us can organize to sustainably learn.
As individuals, without mentors, we would each die off very quickly. Even if we were fed and whatever until we were physically able to take care of ourselves, we wouldn't be able to keep ourselves out of trouble if we had to learn everything ourselves.
Contrast this with the octopus which develops from an egg without any mentorship, and within a year or so has a fantastically knowledgable and creative mind over its respective environment. And they thrive in every oceanic environment in the wet salty world, including coast lines, under permanent Arctic ice, to the deep sea.
To whatever degree they are "intelligent", it's an amazingly accelerated, fully independent, self-taught intelligence. Our species just can't compare on that dimension.
Fortunately, octopus only live a couple years and in an environment where technology is difficult (very hard to isolate and control conditions of all kinds in the ocean). Otherwise, the land octopus would have eaten all of us long ago.
You don’t consider thousands of scientists developing competing, and often incorrect, solutions for a single domain as a “brute force” attempt by humanity, but do when the same occurs with disparate solutions from parallel LLM attempts? That’s certainly an…opinion.
My least favorite type of argument on this site is when someone takes a word with a specific meaning and warps it well beyond the reasonable interpretation just so they can claim they’re making a good analogy. It seems to happen every day here with AI.
Brute force and typical scientific research are such dramatically different things, I have to wonder if bots are getting into HN I almost can’t believe someone would try and argue that’s weird to see the difference.
“Oh you’re using your limbs to move through water? How are you not a dolphin?” lol
Sorites paradox, but for bits of evidence in the Bayesian prior.
Just as a heap of sand stops being a heap when it's small enough, the difference between "science" (not just modern but everything from Newton and Galileo onwards) and "brute force" is the available evidence before whatever hypothesis we're testing.
Scientific research these days requires a lot of prior information, things humanity collectively has learned, as a foundation. We have a lot of weight on our Bayesian priors for whatever hypothesis we're testing.
Sorites even applies to your own attempt to mock it, as the difference between humans and dolphins is "just" a series of genetic changes. Absolutely they're different and it's obvious why you chose the example, but even then it's a series of distinct small changes that are each so small it's easy to blur them together than treat them as a continuum, like we do with water even though that's also discrete molecules.
Humanity massively predates the modern scientific method, it took millennia of mistakes to go from the Greeks being wrong about four elements to finding a bit less than the 91 natural elements, and from there to finding the nucleus (1911) and that it was made of protons and neutrons; and only then did we get to logical positivism (late 1920s), and it was only around WW2 (just before, Karl Popper 1934) that we switched to falsifiability.
Each grain on the heap. We know the fields of work, we know the space of possibilities within the paradigm, the shape of the research can be to constrain that space without finding the answer directly — a divide-and-conquer approach to reducing the space that needs to be then brute forced.
These days we can even automate much of the more obviously brute-force parts, which is why e.g. CERN throws away so much data from the detectors before it even reaches their "real" data processing system. And why SETI automatically processes out any signals that seem to be from in-system before the rest of the work.
Re: the Sorites paradox, I have a definition of a heap that I think is workable.
It's not a specific number; it's "a collection of items becomes a heap when some indeterminate number of items are obscured by other items on top of them, thus making the total number of items uncountable without disturbing the heap".
Therefore it depends on factors beyond just the number of grains of sand; if you have 1000 grains spread out on a surface so they are can all be distinctly counted, that's not a heap. But if you have 1000 grains gathered together, some on top of each other, then it becomes a heap.
Yes, that would be a ridiculous comparison. However, you’re suggesting that calling both a dolphin and a fish “aquatic” is a a false comparison because one is a mammal. Most normal people would call things that someone makes up and are eventually proven false a failed “guess”. Or at least they do when they aren’t busy trying to protect egos. Difference is, one wastes millions of dollars trying to prove every guess right.
But sure, well done, you really got me! Beep boop! I must be a bot because you don’t agree. ”lol”!
I would argue that the ratio of work to breakthroughs is not a form of inefficiency, but something inevitable about the nature of breakthroughs.
In my opinion, a breakthrough is not the production of new knowledge, it is rather its adoption by the public (beginning with industry).
As such, the rate at which breakthroughs can emerge is bounded by factors external to the producers of breakthroughs. And these outside factors are possibly already limiting.
Another point I would make is that what constitutes a breakthrough is not conditioned by how significant it is, only that it is adopted as a change of processes or mental model. As such, more powerful tools can lead to larger leaps between breakthroughs, but not so much higher rate of breakthrough.
As tools become powerful enough to produce yesterday's year's worth of breakthroughs in a month, then the general public and industry will still wait a year before adopting new technology, only it will see larger progress from the previous iteration. This is in fact the case with LLMs. Even on an avant-garde forum as HN, a very common opinion is "I'm waiting out stagnation before I adopt".
As an over simplification, consider only breakthroughs those that come to have widespread commercial application. If we had an oracle for breakthroughs that could produce arbitrarily many today's-breakthroughs as fast as desired, we'd still be limited by our ability to put them in practice. Work must be allocated, carried out over time, and each new breakthrough requires changing processes and the people involved learning new things, which takes time and energy.
I think this human resistance to change is fundamentally what determines the achievable rate of breakthroughs. As the name implies, a breakthrough is a rupture. It is highly inefficient to be upending one's methods every month. It can even be outright impossible to keep up with all the theoretical advancements, before they have crystallized and been digested into accessible vulgarization, if that is not one's profession (i.e. all time devoted to it).
In my applied sciences field, industry is lagging behind some 20 years. And we ourselves are perhaps a century late to some theoretical advances (I can think of one off the top of my head). At the lowest level, there is resistance to change in that ideas take much longer to be carried to a working prototype, than it takes to have them. Hence, someone who constantly hops to new ideas is guaranteed not to make any progress. By necessity, some stubbornness is selected for. Once things are fleshed out (a multi year endeavour), you still have to convince the broader community (same sub field but not direct collaborators) that your idea has merits surpassing theirs, which is a problem best solved one retirement, and one past mentee hire, at a time. And ultimately convince industrial actors that they should dump millions industrializing these novel methods, when none of their competitors have been doing it (hence it is urgent to wait), the viability (robustness, scalability) of the idea remains to be seen, and the benefits weighed against the risk their practitioner user base won't be able to understand the full scope of the progress and see the need to invest time in learning new things and devising new processes (all of which takes time, money, and makes you dependent on this pioneering supplier). And, lastly, there are three other approaches claiming to be better alternatives.
I don't see a way around this pipeline, and more powerful tools can indeed accelerate some of the stages, but there will remain incompressible delays. Ideas need time to be diffused and understood, all the more if they were advancing at a rapid pace enabled by powerful AIs.
I would say that real breakthrough was training NNs as a way to create practical approximators for very complex functions over some kind of many-valued logics. Why they work so well in practice we still don't fully theoretically understand (in the sense we don't know what kind of underlying logic best models what we want from these systems). The LLMs (and application to natural language) are just a consequence of that.
You are contradicting yourself. Either LLM programs can do breakthrough on their own, or they don't have that capacity in a meaningful way based on how they work.
Almost certainly an LLM has, in response to a prompt and through sheer luck, spat out the kernel of an idea that a super-human centaur of the year 2125 would see as groundbreaking that hasn't been recognized as such.
We have a thin conception of genius that can be challenged by Edison's "1% inspiration, 99% perspiration" or the process of getting a PhD were you might spend 7 years getting to the point where you can start adding new knowledge and then take another 7 years to really hit your stride.
I have a friend who is 50-something and disabled with some mental illness, he thinks he has ADHD. We had a conversation recently where he repeatedly expressed his fantasy that he could show up somewhere with his unique perspective and sprinkle some pixie dust on their problems and be rewarded for it. I found it exhausting. When I would hear his ideas, or if I hear any idea, I immediately think "how would we turn this into a product and sell it?" or "write a paper about it?" or "convince people of it?" and he would have no part of it and think that operationalizing or advocating for that was uninteresting and that somebody else would do all that work and my answer is -- they might, but not without the advocacy.
And it comes down to that.
If an LLM were to come up with a groundbreaking idea and be recognized as having a groundbreaking idea it would have to do a sustained amount of work, say at least 2 person × years equivalent to win people over. And they aren't anywhere near equipped to do that, nobody is going to pay the power bill to do that, and if you were paying the power bill you'd probably have to pay the power bill for a million of them to go off in the wrong direction.
Broadly agree (I see lots of "ideas people" who have no interest in doing), the only thing I would say is that the occasional results from the big AI groups suggests it takes less than 1e6 machines — but probably more than 100 even for low-hanging fruit, which is already too much for a lot of people to stomach, so the point is still valid.
> but I don’t commit a comment saying “wow, this clever bit came from the model!”
The other day, Claude Code started adding a small signature to the commit messages it was preparing for me. It said something like “This commit was co-written with Claude Code” and a little robot emoji
I wonder if that just happened by accident or if Anthropic is trying to do something like Apple with the “sent from my iPhone”
Aider does the same thing (and has a similar setting). I tend to squash the AI commits and remove it that way, though I suppose a flag indicating the degree of AI authorship could be useful.
It’s great. It’s become my preferred workflow for vibe coding because it writes great commit messages, gives you a record of authorship, and rollbacks use far fewer tokens. You don’t have to (and probably shouldn’t) let it push to the remote branch.
I'm glad that workflow works for you, I suppose. I let it edit files but not commit because I should have a full understanding and accounting of what changed, why, how, and what to commit.
Most interesting novel ideas originate at the intersection of multiple disciplines. Profitable trades could be found in the biomedicine sector when the knowledge of biomedicine and finance are combined. That's where I see LLMs shining because they span disciplines way more than any human can. Once we figure out a way to have them combine ideas (similar to how Gwern is suggesting), there will be, I suspect, a flood of novel and interesting ideas, inconceivable with humans.
This is bordering conspiracy theory. Thousands of people are getting novel breakthroughs generated purely by LLM an not a single person discloses such result? Not even one of the countless LLM corporation engineers who depend on the billion dollar IV injections from deluded bankers just to continue surviving, and not one has bragged about LLM doing that revolution? Hard to believe.
Countless people are increasing their productivity and talking about it here ad nauseam. Even researchers are leaning on language models; e.g., https://mathstodon.xyz/@tao/114139125505827565
We haven't successfully resolved famous unsolved research problems through language models yet but one can imagine that they will solve increasingly challenging problems over time. And if it happens in the hands of a researcher rather than model's lab, one can also imagine that the researcher will take credit, so you will still have the same question.
My general sense is that for research-level mathematical tasks at least, current models fluctuate between "genuinely useful with only broad guidance from user" and "only useful after substantial detailed user guidance", with the most powerful models having a greater proportion of answers in the former category. They seem to work particularly well for questions that are so standard that their answers can basically be found in existing sources such as Wikipedia or StackOverflow; but as one moves into increasingly obscure types of questions, the success rate tapers off (though in a somewhat gradual fashion), and the more user guidance (or higher compute resources) one needs to get the LLM output to a usable form. (2/2)
There is a LOT of money on this message board trying to convince us of the utility of these machines and yes, people talk about it ad nauseum, in vague terms that are unlike anything I see in the real world, with few examples.
I wonder if it's not the LLM making the breakthrough but rather that the person using the system just needed the information available presented in a clear and orderly fashion to make the breakthrough itself.
After all, the LLM currently has no cognizance, it is unable to understand what it is saying in a meaningful way. At its best it is a P-Noid Zombie machine, right?
In my opinion anything amazing that comes from an LLM only becomes amazing when someone who was capable of recognizing the amazingness perceives it, like a rewrite of a zen koan, "If an LLM generates a new work of William Shakespeare, and nobody ever reads it, was anything of value lost?"
First time I got good code out of a model, I told my friends and coworkers about it. Not anymore. The way I see it, the model is a service I (or my employer) pays for. Everyone knows it’s a tool that I can use, and nobody expects me to apportion credit for whether specific ideas came from the model or me. I tell people I code with LLMs, but I don’t commit a comment saying “wow, this clever bit came from the model!”
If people are getting actual bombshell breakthroughs from LLMs, maybe they are rationally deciding to use those ideas without mentioning the LLM came up with it first.
Anyway, I still think Gwern’s suggestion of a generic idea-lab trying to churn out insights is neat. Given the resources needed to fund such an effort, I could imagine that a trading shop would be a possible place to develop such a system. Instead of looking for insights generally, you’d be looking for profitable trades. Also, I think you’d do a lot better if you have relevant experts to evaluate the promising ideas, which means that more focused efforts would be more manageable. Not comparing everything to everything, but comparing everything to stuff in the expert’s domain.
If a system like that already exists at Jane Street or something, I doubt they are going to tell us about it.