Hacker Newsnew | past | comments | ask | show | jobs | submit | empath75's commentslogin

It's not that it's cheating _in the market_, but if people have an obligation to their employers, etc, to keep information confidential, then they are stealing from their employer by cashing in on it, as sure as if they had taken money from the till.

Even an O3 quality model at that speed would be incredible for a great many tasks. Not everything needs to be claude code. Imagine Apple fine tuning a mid tier reasoning model on personal assistant/MacOs/IOS sorts of tasks and burning a chip onto the mac studio motherboard. Could you run claude code on it? Probably not, would it be 1000x better than Siri? absolutely.

Yeah, waiting for Apple to cut a die that can do excellent local AI.

100x of a less good model might be better than 1 of a better model for many many applications.

This isn't ready for phones yet, but think of something like phones where people buy new ones every 3 years and even having a mediocre on-device model at that speed would be incredible for something like siri.


There are a lot of people here that are completely missing the point. What is it called where you look at a point of time and judge an idea without seemingly being able to imagine 5 seconds into the future.

“static evaluation”

Think about this for solving questions in math where you need to explore a search space. You can run 100 of these for the same cost and time of doing one api call to open ai.

An on-device reasoning model what that kind of speed and cost would completely change the way people use their computers. It would be closer to star trek than anything else we've ever had. You'd never have to type anything or use a mouse again.

I basically just _accidentally_ added a major new feature to one of my projects this week.

In the sense that, I was trying to explain what I wanted to do to a coworker and my manager, and we kept going back and forth trying to understand the shape of it and what value it would add and how much time it would be worth spending and what priority we should put on it.

And I was like -- let me just spend like an hour putting together a partially working prototype for you, and claude got _so close_ to just completely one-shotting the entire feature in my first prompt, that I ended up spending 3 hours just putting the finishing touches on it and we shipped it before we even wrote a user story. We did all that work after it was already done. Claude even mocked up a fully interactive UI for our UI designer to work from.

It's literally easier and faster to just tell claude to do something than to explain why you want to do it to a coworker.


I think you just have to give up ownership of the _code_ and focus on ownership of the _product_.

If the product is software the code is the product.

I don't care about the product as much as I care about the code.

Well, that's the thing. You have to shut your programmer brain off and turn on your business brain. The code never really was that important. Delivering value to end users is the important thing, at least to the people that count.

Tim Bryce, one of the foremost experts on software methodology, hated programmers and considered them deeply sad individuals who had to compensate for their mediocre intelligence and narrow thinking by gatekeeping technology and holding the rest of the company hostage to them. And, he said upper management in corporate America agreed with him.

If you place a lot of value in being a good programmer, then to the real movers and shakers in society you are at best a tool they can use to get even richer. A tool that will soon be replaced by a machine. The time has come for programmers to level up their soft skills and social standing, and focus their intelligence on the business rather than the code. It sucks but that's the reality of the AI era.


you're a tool to them even if you are miserable instead of enjoying what you do.

Do you think the product is not the code?

From a user perspective, yes. The product is what it does, not how it does it.

But for the person building the product it very much is how it does it.

I think we (developers) need to get over that. Code was always the means to an end, which is providing a product to solve a problem, not the end itself.

The code is still the means to the end. AIs still write code, that is compiled and deployed and operated in some manner.

It isn't, no one is buying code on it's own - but it's a component of the product. I dislike the phrasing above since it assumes the two are distinct things.

Honestly, it never was. Personal projects not withstanding, if there is a product that should always be the focus. Code is only a means to that.

I think you have it exactly backwards, and that "owning the stack" is going to be important. Yes the harness is important, yes the model is important, but developing the harness and model together is going to pay huge dividends.

https://mariozechner.at/posts/2025-11-30-pi-coding-agent/

This coding agent is minimal, and it completely changed how I used models and Claude's cli now feels like extremely slow bloat.

I'd not be surprised if you're right in that this is companies / management will prefer to "pay for a complete package" approach for a long while, but power-users should not care for the model providers.

I have like 100 lines of code to get me a tmux controls & semaphore_wait extension in the pi harness. That gave me a better orchestration scheme a month ago when I adopted it, than Claude has right now.

As far as I can tell, the more you try to train your model on your harness, the worse they get. Bitter lesson #2932.


> I'd not be surprised if you're right in that this is companies / management will prefer to "pay for a complete package" approach for a long while

I mean I suspect for corporate usage Microsoft already has this wrapped up with Microsoft & GitHub Co-Pilots.


OpenAI, Anthropic, Google, Microsoft certainly desire path dependence but the very nature of LLMs and intelligence itself might make that hard unless they can develop models which truly are differentiated (and better) from the rest. The Chinese open source models catching up make me suspect that won't happen. The models will just be a commodity. There is a countdown clock for when we can get Opus 4.6+ level models and its measured in months.

The reason these LLM tools being good is they can "just do stuff." Anthropic bans third party subscription auth? I'll just have my other tool use Claude Code in tmux. If third party agents can be banned from doing stuff (some advanced always on spyware or whatever), then a large chunk of the promise of AI is dead.

Amp just announced today they are dumping IDE integration. Models seem to run better on bare-bones software like Pi, and you can add or remove stuff on the fly because the whole things open source. The software writes itself. Is Microsoft just trying to cram a whole new paradigm in to an old package? Kind of like a computer printer. It will be a big business, but it isn't the future.

At scale, the end provider ultimately has to serve the inference -- they need the hardware, data centers & the electricity to power those data centers. Someone like Microsoft can also provide a SLA and price such appropriately. I'll avoid a $200/month customer acquisition cost rant, but one user, running a bunch of sub agents, can spend a ton of money. If you don't own a business or funding source, the way state of the art LLMs are being used today is totally uneconomical (easy $200+ an hour at API prices.)

36+ months out, if they overbuild the data centers and the revenue doesn't come in like OpenAI & Anthropic are forecasting, there will be a glut of hardware. If that's the case I'd expect local model usage will scale up too and it will get more difficult for enterprise providers.

(Nothing is certain but some things have become a bit more obvious than they were 6 months ago.)


Thinking about this a little more -> "nature of LLMs and intelligence"

Bloated apps are a material disadvantage. If I'm in a competitive industry that slow down alone can mean failure. The only thing Claude Code has going for it now is the loss making $200 month subsidy. Is there any conceivable GUI overlay that Anthropic or OpenAI can add to make their software better than the current terminal apps? Sure, for certain edge cases, but then why isn't the user building those themselves? 24 months ago we could have said that's too hard, but that isn't the case in 2026.

Microsoft added all of this stuff in to Windows, and it's a 5 alarm fire. Stuff that used to be usable is a mess and really slow. Running linux with Claude Code, Codex, or Pi is clearly superior to having a Windows device with neither (if it wasn't possible to run these in Windows; just a hypothetical.)

From the business/enterprise perspective - there is no single most important thing, but having an environment that is reliable and predictable is high up there. Monday morning, an the Anthropic API endpoint is down, uh oh! In the longer term, businesses will really want to control both the model and the software that interfaces with it.

If the end game is just the same as talking to the Star Trek computer, and competitors are narrowing gaps rather than widening them (e.g. Anthropic and OpenAI releases models minutes from each other now, Chinese frontier models getting closer in capability not further), then it is really hard to see how either company achieves a vertical lock down.

We could actually move down the stack, and then the real problem for OpenAI and Anthropic is nVidia. 2030, the data center expansion is bust, nVidia starts selling all of these cards to consumers directly and has a huge financial incentive to make sure the performant local models exist. Everyone in the semiconductor supply chain below nvidia only cares about keeping sales going, so it stops with them.

Maybe nvidia is the real winner?

Also is it just me or does it now feel like hn comments are just talking to a future LLM?


That was true more mid last year, but now we have a fairly standard flow and set of core tools, as well as better general tool calling support. The reality is that in most cases harnesses with fewer tools and smaller system prompts outperform.

The advances in the Claude Code harness have been more around workflow automation rather than capability improvements, and truthfully workflows are very user-dependent, so an opinionated harness is only ever going to be "right" for a narrow segment of users, and it's going to annoy a lot of others. This is happening now, but the sub subsidy washes out a lot of the discontent.


If Claude Code is so much better why not make users pay to use it instead of forcing it on subscribers?

You're right, because owning the stack means better options for making tons of money. Owning the stack is demonstrably not required for good agents, there are several excellent (frankly way better than ol' Claude Code) harnesses in the wild (which is in part why so many people are so annoyed by Anthropic about this move - being forced back onto their shitty cli tool).

So here are a few things i have been thinking of: --- It's not 2 pizza teams, it's 2 people teams. You no longer need 4 people on a team just working on features off of a queue, you just need 2 people making technical decisions and managing agents. --- Code used to be expensive to create. It was only economical to write code if it was doing high value work or work that would be repeated many times over a long period of time.

Now producing code is _cheap_. You can write and run code in an automated way _on demand_. But if you do that, you have essentially traded upfront cost for run time cost. It's really only worth it if the work is A) high value and B) intermittent.

There is probably a formula you can write to figure out where this trade off makes sense and when it doesn't.

I'm working on a system where we can just chuck out autonomous agents onto our platform with a plain text description, and one thing I have been thinking about is tracking those token costs and figuring out how to turn agentic workflows into just normal code.

I've been thinking about running an agent that watches the other agents for cost and reads their logs ono a schedule to see if any of what the agents are doing can be codified and turned into a normal workflow, and possibly even _writing that workflow itself_.

It would be analogous to the JVM optimizing hot-path functions... ---

What I do know is that what we are doing for a living will be near unrecognizable in a year or two.


This sounds very fascinating. One of the more interesting ideas I have come across.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: