This strikes me as a very solid methodology for improving the results of all AI coding tools. I hope Anthropic, etc take this up.
Rather than converging on optimal code (Occam's Razor for both maintainability and performance) they are just spewing code all over the scene. I've noticed that myself, of course, but this technique helps to magnify and highlight the problem areas.
It makes you wonder how much training material was/is available for code optimization relative to training material for just coding to meet functional requirements. And therefore, what's the relative weight of optimizing code baked into the LLMs.
"According to the study, AI still struggles with several crucial facets of coding: sweeping scopes involving huge codebases, the extended context lengths of millions of lines of code, higher levels of logical complexity, and long-horizon or long-term planning about the structure and design of code to maintain code quality."
uhhh, not sure even the best people or teams are very good at this either. Condemning AI for not being capable of something we're not capable of, ok...
“If it takes longer to explain to the system all the things you want to do and all the details of what you want to do, then all you have is just programming by another name.”
This is called the specification process, which hopefully is already occurring today.
There's so much self-serving bias in articles like this, as well as the comments on HN, Reddit, etc. It's good to critique AI, but that self-serving line is frequently crossed by many people.
I’ve seen comments here claiming that this site is either a bunch of coders coping about the limitations of AI and how it can’t take their job, or a bunch of startup dummies totally on the AI hype train.
Now, there’s a little room between the two—maybe the site is full of coders on a cope train, hoping that we’ll be empowered by nice little tools rather than totally replaced. And, ya know, multiple posters with multiple opinions, some contradictions are expected.
But I do find it pretty funny to see the multiple posters here describe the site they are using as suffering from multiple, contradictory, glaringly obvious blindspots.
Let's just imagine we're critiquing cars or planes in about 1910. They're so flawed. No one can say with certainty whether or how they will improve.
Side note of interest, from Wikipedia:
"Flying Machines Which Do Not Fly" is an editorial published in the New York Times on October 9, 1903. The article incorrectly predicted it would take one to ten million years for humanity to develop an operating flying machine.
I think we have heated debate because most people don't explain what's their `cost function`. For someone when they talk about AI they take a binary reasoning that if something is not perfect today then it will never be perfect or will never improve. For other is just they see something is useful today and know it will get better next year and don't have expectation of getting AGI.
In your reply it's equivalent of someone expecting AGI in next decade. The same is when people talk about if AI will take software dev jobs. Some just see all the flows in AI and they know they job is secure. Some other see that they are 2x productive and potentially your team mate not needed anymore. If AI can eliminate 50% of IT jobs in ~10-20 years then thats still job replacement. When we replaced horses with cars that doesn't mean we have no horses today or that nobody ride horses.
If we had funded that as much as we're currently funding AI I think it would have been a plausible goal. Keep in mind we cut more than half of NASA's budget after we first landed.
>It was only six years to go from the first multi-person spacecraft and first spacewalk to the first space station.
Yeah that's my entire point, technological process doesn't have a constant rate of acceleration. Some advances are quickly made one after another and others lag and take a very long time.
How long do you think it would have taken to get a permanent moon presence if we kept up Apollo level funding indefinitely with that as the main goal? And since I only said "plausible", let's go with 80th-90th percentile best case scenario.
Even if technological progress stopped we could have launched enough parts to assemble a colony structure.
This is splitting hairs, but I was envisaging something more like the ISS but on the moon. A far cry from the 50s/60s dream of a grill and white picket fence on the moon.
It's a good idea to think of a converse situation, but this is a bad example. The constraint was not about technology but about budget, perceived benefits and political will.
What technologies existing in 1969 might have been used to create an environment with a breathable atmosphere and a self-sustaining ecosystem on the Moon?
What technologies existing today might be used for this purpose, assuming no financial or political limitations?
This subject always reminds me of something someone said, possibly Professor Alain April, "software is the only system where maintenance tends to degrade it."
But - why is the most important information, the title, in such a light, hard to read, font? The title should stand out, not the comments count etc. See... Hackernews! :-)
I don't understand the glee so many people have over this. I love being able to use Generative AI tools. How is it different than if I asked a person to draw these pictures for me? I know someone will gleefully clobber this question with a legal answer, but God, let's move forward, hunh?
A bunch of rich people are raiding a little bit of work, each, from a whole bunch of people, then walling it off so they can get richer.
I’d not have a problem with this, personally, if their models were as available as the stuff they took from others. Instead it’s take, take, take… now wait a minute, that pile of loot I stole is mine!
Rather than converging on optimal code (Occam's Razor for both maintainability and performance) they are just spewing code all over the scene. I've noticed that myself, of course, but this technique helps to magnify and highlight the problem areas.
It makes you wonder how much training material was/is available for code optimization relative to training material for just coding to meet functional requirements. And therefore, what's the relative weight of optimizing code baked into the LLMs.