That may be true, but you can’t compare average GenAI with the best humans because there are many reasons the human output is low quality: budget, timelines, oversights, not having the best artists, etc. Very few games use the best human artists for everything.
Same with programming. The best humans write better code than Codex, but the awful government portals and enterprise apps you’re using today were also written by humans.
Model capability improvements are very uneven. Changes between one model and the next tend to benefit certain areas substantially without moving the needle on others. You see this across all frontier labs’ model releases. Also the version numbering is BS (remember GPT-4.5 followed by GPT-4.1?).
Note that this is not relevant for reasoning models, since they will think about the problem in whatever order it wants to before outputting the answer. Since it can “refer” back to its thinking when outputting the final answer, the output order is less relevant to the correctness. The relative robustness is likely why openai is trying to force reasoning onto everyone.
This is misleading if not wrong. A thinking model doesn’t fundamentally work any different from a non-thinking model. It is still next token prediction, with the same position independence, and still suffers from the same context poisoning issues. It’s just that the “thinking” step injects this instruction to take a moment and consider the situation before acting, as a core system behavior.
But specialized instructions to weigh alternatives still works better as it ends up thinking about thinking, thinking, then making a choice.
I think you are misleading as well. Thinking models do recursively generate the final “best” prompt to get the most accurate output. Unless you are genuinely giving new useful information in the prompt, it is kind of useless to structure the prompt in one way or another because reasoning models can generate intermediate steps that give best output. The evidence on this is clear - benchmarks reveal that thinking models are way more performant.
You're both kind of right.
The order is less important for reasoning models, but if you carefully read thinking traces you'll find that the final answer is sometimes not the same as the last intermediary result. On slightly more challenging problems LLMs flip flop quite a bit and ordering the output cleverly can uplift the result. That might stop being true for newer or future models but I iterated quite a bit in this for sonnet 4.
This article spent a lot of words to say very little. Specifically, it doesn’t really say why working towards AGI doesn’t bring advancements to “practical” applications and why the gazillion AI startups out there won’t either. Instead, we need Trump to step up?
More and more I feel like these policy articles about AI are an endless stream of slop written by people who aren’t familiar with and have never worked on current AI.
That’s an interesting point. It’s not hard to imagine that LLMs are much more intelligent in areas where humans hit architectural limitations. Processing tokens seems to be a struggle for humans (look at how few animals do it overall, too), but since so much of the human brain is dedicated to movement planning, it makes sense that we still have an edge there.
The past few years I’ve been hearing crazy stories of workarounds and scripts to deal with all these new features in Windows. Isn’t that what was preventing people from using Linux? Replacing utilman.exe with cmd.exe is not something a normal user would ever do.
I was thinking the same thing. Never thought I'd see a world where Arch has an installer (and, jokes aside, many Linux distros have very straightforward GUI installers) while people have to... "hit Shift+F10 to get a terminal, then enter start ms-cxh:localonly" to install Windows with a local account. Jeez.
> Does "career development" just mean "more money"?
Big companies means more opportunities to lead bugger project. At a big company, it’s not uncommon to in-house what would’ve been an entire startup’s product. And depending on the environment, you may work on several of those project over the course of a few years. Or if you want to try your hand at leading bigger teams, that’s usually easier to find in a big company.
> Is it still satisfying if that software is bad, or harms many of those people?
There’s nothing inherently good about startups and small companies. The good or bad is case-by-case.
My experience at big companies has been that you only get the opportunity to do something big if you are willing to waste years "proving yourself" on a lot of tedious bullshit first. The job you want is not the job you get to apply for, and I've never had the patience to stick it out. Smaller companies let me do meaningful work right away.
Politely, I disagree. It means you are in a context where the risk aversion is high, everyone keeps their head down.
Done right, you can be a disruptor, for what are very benign or proven changes outside of the false ecosystem you are in.
I recommend these changes are on the level of "we will allow users to configure a most used external tool on a core object, using a URI template" - the shock, awe, destruction is everyone realizing something is a web app and you could just... If you wanted... Use basic HTML to make lives better.
Your opponents are then arguing against how the web works, and you have won the framing with every employee that has ever done something basic with a browser.
You might find this level of "innovation" silly, but it's also representative of working in the last few tiers of a distribution curve - the enterprise adopters lagging behind the late adopters.
> Big companies means more opportunities to lead bugger project. At a big company, it’s not uncommon to in-house what would’ve been an entire startup’s product. And depending on the environment, you may work on several of those project over the course of a few years. Or if you want to try your hand at leading bigger teams, that’s usually easier to find in a big company.
Okay, so career development means "bigger projects"?
> There’s nothing inherently good about startups and small companies. The good or bad is case-by-case.
Well, maybe not, but I think the post illustrates some ways big companies are worse. I'd say that, all else being equal, companies tend to get bigger by becoming more doggedly focused on money, which tends to lead to doing evil things because you no longer see refraining from doing so as important compared to making money. Also, all else equal, a company that does something bad on a small scale is likely less bad than one that does something bad on a large scale.
projects beyond a certain size in a large org imply things which are very different - people, networking, money, regulations, politics, business, security etc all things which don’t look spectacular when you have three people, but become very important and much harder with hundreds of people.
So career development really means ‘learning a completely different skillset which is not technical’
That's a good way to put it and is something I've often thought as well, although not just in the technical realm. I think of it as "doing a different job". You used to be a teacher but now you're the principal; you used to hammer in nails but now you direct the construction crew; you used to be writing software but now you manage other people who write software; etc.
Personally I'd struggle to consider that "development" for my own life, since it often amounts to no longer doing the job I like and instead watching other people do it. I can understand how adding new skills is positive, though.
This can be mitigated by learning other technical fields ( infrastructure, security, etc ) and using your technical knowledge to steer things in the right direction - but yes, you’re otherwise right and I understand your point of view.
GPT-4 is very different from the latest GPT-4o in tone. Users are not asking for the direct no-fluff GPT-4. They want the GPT-4o that praises you for being brilliant, then claims it will be “brutally honest” before stating some mundane take.
Same with programming. The best humans write better code than Codex, but the awful government portals and enterprise apps you’re using today were also written by humans.
reply