Hacker Newsnew | past | comments | ask | show | jobs | submit | dist-epoch's commentslogin

That's not the real thinking, it's a super summarized view of it.

Musk said Grok 5 is currently being trained, and it has 7 trillion params (Grok 4 had 3)

My understanding is that all recent gains are from post training and no one (publicly) knows how much scaling pretraining will still help at this point.

Happy to learn more about this if anyone has more information.


I still remember gemini 1.5 ultra and gpt 4.5 as extremely strong at some areas that no benchmark capture. It was probably not economical to use them at 20 usd subscription, but they felt differently and smarter at some ways. The benchmarks seems to be missing something, because flash 3 was very close on some benchmarks to 3 pro, but much, much dumber.

You gain more benefit spending compute on post-training than on pre-training.

But scaling pre-training is still worth it if you can afford it.


What a wild world, sending 50 emails costs money :)

Where were those concerned people when they lost their jobs due to production moving to China?

Seems like what you are saying already failed in the past.


I still watch the 7 hours of video every day, but the Instagram/TikTok algorithm can now find the perfect videos for me by choosing between 1000 hours of created video instead of the pre-AI 100 hours.

Generating audio is far from being an "intensive" operation these days.

It has nothing to do with cpu cycles, and everything to do with realtime safety. You must be able to guarantee that nothing will block the realtime audio thread(s), and that's hard to do in a variety of "modern" languages (because they are not designed for this).

I know you are an audio guy, I also wrote low-latency audio software. I was just saying that setting HIGH_PRIORITY on the audio running thread and it's feeding threads is enough, you don't need QNX. Python has the GIL problem, but that is another story.

For a simple audio app like this synth on a modern CPU it's kind of trivial to do it in any language if the buffer is >40 ms. I'm talking about managing the buffers. Running the synth/filter math in pure Python is still probably not doable.


Sure, but 40ms for a synth intended to be played is generally the kiss of death these days, unless you target audience are all pipe organ players ...

> a senior scientist at a national lab thinking ai isn't really useful because the free reasoning version couldn't generate working code

I would question if such a scientist should be doing science, it seems they have serious cognitive biases


My bad; I should have been more precise: "ai" in this case is "LLMs for coding".

If all one uses is the free thinking model their conclusion about its capability is perfectly valid because nowhere is it clearly specified that the 'free, thinking' model is not as capable as the 'paid, thinking ' model, Even the model numbers are the same. And given that the highest capability LLMs are closed source and locked behind paywalls, there is no means to arrive at a contrary verifiable conclusion. They are a scientist, after all.

And that's a real problem. Why pay when you think you're getting the same thing for free. No one wants yet another subscription. This unclear marking is going to lead to so many things going wrong over time; what would be the cumulative impact?


> nowhere is it clearly specified that the 'free, thinking' model is not as capable as the 'paid, thinking '

nowhere is it clearly specified that the free model IS as capable as the paid one either. so if you have uncertainty if IS/IS-NOT as capable, what sort of scientist assumes the answer IS?


> nowhere is it clearly specified that the free model IS as capable as the paid one either. so if you have uncertainty if IS/IS-NOT as capable, what sort of scientist assumes the answer IS?

Putting the same model name/number on both the free and paid versions is the specification that performance will be the same. If a scientist has to bring to bear his science background to interpret and evaluate product markings, then society has a problem. Any reasonable person expects products with the same labels to perform similarly.

Perhaps this is why Divisions/Bureaus of Weights and Measures are widespread at the state and county levels. I wonder if a person that brings a complaint to one of these agencies or a consumer protection agency to fix this situation wouldn't be doing society a huge service.


They don't have the same labels though. On the free ChatGPT you can't select thinking mode.

> On the free ChatGPT you can't select thinking mode.

This is true, but thinking mode shows up based on the questions asked, and some other unknown criteria. In the cases I cited, the responses were in thinking mode.


Haters gonna hate, but bro vibe-coded himself into being a billionaire and having Sam Altman and Zuck personally fight over him.

Proof you can get hired off of a portfolio where you've never even viewed a single line of code form it. Definitely feel a mix of envy and admiration.

It was never really about the code itself anyways.

To be fair, it's not like he did not read a single line of code that ended up being generated.

I'd tell those two off before taking a penny

money or morals, choose one


Opportunity cost.

Cloning Slack and wasting ultra-expensive engineers on that might be more expensive, and it's not your core mission.


Why do you have to waste ultra-expensive engineers on it? You have agents. And verifying your product works as it is claimed should absolutely be part of your mission. How can you possibly claim that your models are revolutionising software development if you haven't even used them to revolutionise your own software development in-house? Not only that, it would produce a huge marketing coup that would immediately lead to a flood of enterprise spending if you could demonstrate that your agents actually do what you constantly claim them to do.

PS. If you're claiming that coding an application is ultra-expensive, you are already entering the argument on the side of the comment you're arguing against, which is making a counterpoint to the article, which claims in the first sentence:

> The math is simple: if it costs almost nothing to build an app, it costs almost nothing to clone an app. And if cloning is free, subscription pricing dies.


They did revolutionise software development in-house. Both Codex/Claude Code are 90% agent written these days, and bring in billions of dollars of revenue.

Cloning Slack would bring in $0 dollars.


Billions of dollars of revenue on trillions of dollars of investment is not a revolutionary feat. I promise you I could turn trillions into billions too.

Neither of those software are primarily responsible for the revenue, either. The actual models underlying them are, not the trivial CLI chat interface (which, despite being trivial software, still manages to be full of bugs that go unfixed for months). I also don't even think it's true that Codex is primarily agent-written. OpenAI specifically cited using Electron in their recent Codex desktop application for "agent orchestration" to save human developer time on porting it across platforms, which does not sound like a successful exercise in eating their dog food.


If you could turns trillions into billions, why haven’t you yet?

>Both Codex/Claude Code are 90% agent written these days, and bring in billions of dollars of revenue.

So they say.


Yeah this got a chuckle from me.

If you have tools that allow superior efficiency shouldn't you be hiring every possible just expensive engineer you can get your hands on and put them to produce massive amounts of products to out compete everyone else in the world.

Shouldn't they be in place to replace absolutely every other tech company? That is tens of trillions of valuation in short few years.


You are missing that instead of you prompting Claude/Codex you could have your OpenClaw manager prompt them.

Not saying it works perfectly, but it's where things are going.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: