Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The author seems unaware of how well recent Apple laptops run LLMs. This is puzzling and puts into question the validity of anything in this article.


If Apple offered a reasonably-priced laptop with more than 24gb of memory (I'm writing this on a maxed-out Air) I'd agree. I've been buying Apple laptops for a long time, and buying the maximum memory every time. I just checked, and I see that now you can get 32gb. But to get 64gb I think you have to spend $3700 for the MBMax, and 128gb starts at $4500, almost 3x the 32gb Air's price.

And as far as I understand it, an Air with an M3 is perfectly capable of running larger models (albeit slower) if it had the memory.


You’re not wrong that Apple’s memory prices are unpleasant, but also consider the competition - in this context (running LLMs locally) laptops with large amounts of fast memory that can be purposed for the GPU. This limits you to Apple or one specific AMD processor at present.

An HP Zbook with an AMD 395+ and 128Gb of memory apparently lists for $4049 [0]

An ASUS ROG Flow z13 with the same spec sells for $2799 [1] - so cheaper than Apple, but still a high price for a laptop.

[0] https://hothardware.com/reviews/hp-zbook-ultra-g1a-128gb-rev...

[1] https://www.hidevolution.com/asus-rog-flow-z13-gz302ea-xs99-...


Yeah, I'm by no means saying that Apple is uniquely bad here -- it's just an issue I've been frustrated by since the first M1 chip, long before local LLMs made it a serious issue. More memory is always a good idea, and too much is never enough.


You can get any low spec laptop that has no soldered DIMMs and just replace them with the maximum supported capacity.

You don't necessarily need to go the maxed up SKU.


Would that be unified memory? Where the gpu and cpu can share the memory? Which is key for performance.


Right, no, it wouldn't, I appreciate that in this particular context my comment was entirely wrong.

Thanks for helping me see it!


No, it wouldn’t. You’d be limited to using the CPU and the lower bandwidth system memory.


The framework desktop will get you the 395+ and 128gb of ram for 2k USD.


The trick here is buying used. Especially for something like the m1 series there is tremendous value to be had on high memory models where the memory hasn't changed significantly over generations compared the cpus and even m1's are quite competent for many workloads. Got a m1 max 64gb ram recently for I think $1400.


I think pricing is just one dimension of this discussion — but let's dive into it. I agree it's a lot of money. But what are you comparing this pricing to?

From what I understand, getting a non-Apple solution to the problem of running LLMs in 64GB of VRAM or more has a price tag that is at least double of what you mentioned, and likely has another digit in front if you want to get to 128GB?


it's astonishing how apple gouges on the memory and ssd upgrade prices (I'm on an M1 w/ 64Gb/4Tb).

That said they have some elasticity when it comes to the DRAM shortage.


The M-series unified memory is built into the chip itself, not separate components. Of course Apple is going to maintain their margins, but it’s easy to see why with this design more memory is more expensive than drams. Well maybe not with the current market pricing which hopefully is temporary.


They gouge you on RAM and SSD but provide a far better overall machine for the price than Windows laptops.


I think the author is aware of Apple silicon. The article mentions the fact Apple has unified memory and that this is advantageous for running LLMs.


Then idk why they say that most laptops are bad at running LLMs, Apple has a huge marketshare in the laptop market and even their cheapest laptops are capable in that realm. And their PC competitors are more likely to be generously specced out in terms of included memory.

> However, for the average laptop that’s over a year old, the number of useful AI models you can run locally on your PC is close to zero.

This straight up isn’t true.


Apple has a 10-18% market share for laptops. That's significant but it certainly isn't "most".

Most laptops can run at best a 7-14b model, even if you buy one with a high spec graphics chip. These are not useful models unless you're writing spam.

Most desktops have a decent amount of system memory but that can't be used for running LLMs at a useful speed, especially since the stuff you could run in 32-64GB RAM would need lots of interaction and hand holding.

And that's for the easy part, inference. Training is much more expensive.


my laptop is 4 years old. I only have 6Gb VRam. I run, mostly, 4b and 8b models. They are extremely useful in a variety of situations. Just because you can't replicate what you do in chatgpt doesn't mean they don't have their use cases. It seems to me you know very little about what these models can do. Not to speak of trained models for specific use cases, or even smaller models like functiongemma or TTS/ASR models. (btw, I've trained models using my 6Gb VRAM too)


I’ll chime in and say I run LM Studio on my 2021 MacBook Pro M1 with no issues.

I have 16GB ram. I use unsloth quantized models like qwen3 and gpt-oss. I have some MCP servers like Context7 and Fetch that make sure the models have up to date information. I use continue.dev in VSCode or OpenCode Agent with LM Studio and write C++ code against Vulkan.

It’s more than capable. Is it fast? Not necessarily. Does it get stuck? Sometimes. Does it keep getting better? With every model release on huggingface.

Total monthly cost: $0


A few examples of useful tasks would be appreciated. I do suffer from a sad lack of imagination.


I suggest taking a look at /r/localLLaMa and see all sorts of cool things people do with small models.


A Max cpu can run 30b models quantized, and definitely has the RAM to fit them in memory. The normal and pro CPUs will be compute/bandwidth limited. Of course, the Ultra CPU is even better than the Max, but they don't come in laptops yet.


So I'm hearing a lot of people running LLMs on Apple hardware. But is there actually anything useful you can run? Does it run at a usable speed? And is it worth the cost? Because the last time I checked the answer to all three questions appeared to be no.

Though maybe it depends on what you're doing? (Although if you're doing something simple like embeddings, then you don't need the Apple hardware in the first place.)


I was sitting in an airplane next to a guy on a MacBook pro something who was coding in cursor with a local llm. We got talking and he said there are obviously differences but for his style of 'English coding' (he described basically what code to write/files to change but in english, but more sloppy than code obviously otherwise he would just code) it works really well. And indeed that's what he could demo. The model (which was the OSS gpt i believe) did pretty well in his nextjs project and fast too.


Thanks. I call this method Power Coding (like Power Armor), where you're still doing everything except for typing out the syntax.

I found that for this method the smaller the model, the better it works, because smaller models can generally handle it, and you benefit more from iteration speed than anything else.

I don't have hardware to run even tiny LLMs at anything approaching interactive speeds, so I use APIs. The one I ended up with was Grok 4 Fast, because it's weirdly fast.

ArtificialAnalysis has a section "end to end" time, and it was the best there for a long time, tho many other models are catching up now.


The speed is fine, the models are not.

I found only one great application of local LLMs: spam filtering. I wrote a "despammer" tool that accesses my mail server using IMAP, reads new messages, and uses an LLM to determine if they are spam or not. 95.6% correct classification rate on my (very difficult) test corpus, in practical usage it's nearly perfect. gpt-oss-20b is currently the best model for this.

For all other purposes models with <80B parameters are just too stupid to do anything useful for me. I write in Clojure and there is no boilerplate: the code reflects real business problems, so I need an LLM that is capable of understanding things. Claude Code, especially with Opus, does pretty well on simpler problems, all local models are just plain dumb and a waste of time compared to that, so I don't see the appeal yet.

That said, my next laptop will be a MacBook pro with M5 Max and 128GB of RAM, because the small LLMs are slowly getting better.


I've tried out gpt-oss:20b on a MacBook Air (via Ollama) with 24GB of RAM. In my experience it's output is comparable to what you'd get out of older models and the openAI benchmarks seem accurate https://openai.com/index/introducing-gpt-oss/ . Definitely a usable speed. Not instant, but ~5 tokens per second of output if I had to guess.


This paper shows a use case running on Apple silicon that’s theoretically valuable:

https://pmc.ncbi.nlm.nih.gov/articles/PMC12067846/

Who cares if result is right / wrong etc as it will all be different in a year … just interesting to see a test of desktop class hardware go ok.


I have an MBP Max M3 with 64GB of RAM, and I can run a lot at useful speed (LLMs run fine, diffusion image models run OK although not as fast as they would on a 3090). My laptop isn't typical though, it isn't a standard MBP with a normal or pro processor.


I can definitely write code with a local model like Devstral small or a quantized granite, or a quantized deep-seek on an M1 Max w/ 64gb of ram.


Of course it depends what you’re doing.

Do you work offline often?

Essential.


Most laptops have 16GB of RAM or less. A little more than a year ago I think the base model Mac laptop had 8GB of RAM which really isn't fantastic for running LLMs.


By “PC”, they mean non-Apple devices.

Also, macOS only has around 10% desktop market share globally.


It's actually closer to 20% globally. Apple now outsells Lenovo:

https://www.mactech.com/2025/03/18/the-mac-now-has-14-8-of-t...


I meant market share in terms of installed base: https://gs.statcounter.com/os-market-share/desktop/worldwide...


macOS and OS X are split on this graph, and “Unknown” could be anything? This might actually show Apple install base close to 20%.


> Apple has a huge marketshare in the laptop market

Hello, from outside of California!


Global Mac marketshare is actually higher than the US: https://www.mactech.com/2025/03/18/the-mac-now-has-14-8-of-t...


Less than 1 in 5 doesn’t feel like huge market share,

but it’s more than I have!


Apple outsells Lenovo, if that puts it in a different perspective.


But economically, it is still much better to buy a lower spec't laptop and to pay a monthly subscription for AI.

However, I agree with the article that people will run big LLMs on their laptop N years down the line. Especially if hardware outgrows best-in-class LLM model requirements. If a phone could run a 512GB LLM model fast, you would want it.


Are you sure the subscription will still be affordable after the venture capital flood ends and the dumping stops?


100% yes.

The amount of compute in the world is doubling over 2 years because of the ongoing investment in AI (!!)

In some scenario where new investment stops flowing and some AI companies go bankrupt all that compute will be looking for a market.

Inference providers are already profitable so with cheaper hardware it will mean even cheaper AI systems.


You should probably disclose that you're a CTO at an AI startup, I had to click your bio to see that.

> The amount of compute in the world is doubling over 2 years because of the ongoing investment in AI (!!)

All going into the hands of a small group of people that will soon need to pay the piper.

That said, VC backed tech companies almost universally pull the rug once the money stops coming in. And historically those didn't have the trillions of dollars in future obligations that the current compute hardware oligopoly has. I can't see any universe where they don't start charging more, especially now that they've begun to make computers unaffordable for normal people.

And even past the bottom dollar cost, AI provides so many fun, new, unique ways for them to rug pull users. Maybe they start forcing users to smaller/quantized models. Maybe they start giving even the paying users ads. Maybe they start inserting propaganda/ads directly into the training data to make it more subtle. Maybe they just switch out models randomly or based on instantaneous hardware demand, giving users something even more unstable than LLMs already are. Maybe they'll charge based on semantic context (I see you're asking for help with your 2015 Ford Focus. Please subscribe to our 'Mechanic+' plan for $5/month or $25 for 24 hours). Maybe they charge more for API access. Maybe they'll charge to not train on your interactions.

I'll pass, thanks.


I'm not longer CTO at an AI startup. Updated, but don't actually see how that is relevant.

> All going into the hands of a small group of people that will soon need to pay the piper.

It's not very small! On the inference side there are many competitive providers as well as the option of hiring GPU servers yourself.

> And historically those didn't have the trillions of dollars in future obligations that the current compute hardware oligopoly has. I can't see any universe where they don't start charging more, especially now that they've begun to make computers unaffordable for normal people.

I can't say how strongly I disagree with this - it's just not how competition works, or how the current market is structured.

Take gpt-oss-120B as an example. It's not frontier level quality but it's not far off and certainly gives a strong redline that open source models will never get less intelligent than.

There is a competitive market in hosting providers, and you can see the pricing here: https://artificialanalysis.ai/models/gpt-oss-120b/providers?...

In what world is there a way in which all the providers (who are want revenue!) raise prices above the premium price Cerebas is charging for their very high speed inference?

There's already Google, profitable serving at the low-end at around half the price of Cerebas (but then you have to deal with Google billing!)

The fact that Azure/Amazon are all pricing exactly the same as 8(!) other providers as well as the same price https://www.voltagepark.com/blog/how-to-deploy-gpt-oss-on-a-... gives for running your own server shows how the economics work on NVidia hardware. There's no subsidy going on there.

This is on hardware that is already deployed. That isn't suddenly going to get more expensive unless demand increases... in which case the new hardware coming online over the next 24 months is a good investment, not a bad one!


Datacenters full of GPU hosts aren't like dark fiber - they require massive ongoing expense, so the unit economics have to work really well. It is entirely possible that some overbuilt capacity will be left idle until it is obsolete.


The ongoing costs are mostly power, and aren't that massive compared to the investment.

No one is leaving an H100 cluster not running because the power costs too much - this is why remnants markets like Vast.ai exist.


They absolutely will leave them idle if the market is so saturated that no one will pay enough for tokens to cover power and other operational costs. Demand is elastic but will not stretch forever. The build out assumes new applications with ROI will be found, and I'm sure they will be, but those will just drive more investment. A massive over build is inevitable.


Of course!

But the operational costs are much lower than some people in this thread seem to think.

You can find a safe margin for the price by looking at aggregators.

https://gpus.io/gpus/h100 is showing $1.83/hour lowest price, around $2.85 average.

That easily pays running costs - a H100 server with cooling etc is around $0.10/hour to keep running

And a massive overbuild pushes prices down not up!


> Inference providers are already profitable.

That surprises me, do you remember where you learned that?


Lots of sources, and you can do the math yourself.

Here's a few good ones:

https://github.com/deepseek-ai/open-infra-index/blob/main/20... (suggests Deepseek is making 80% raw margin on inference)

https://www.snellman.net/blog/archive/2025-06-02-llms-are-ch...

https://martinalderson.com/posts/are-openai-and-anthropic-re... (there's a HN discussion of this where it was pointed out this overestimates the costs)

https://www.tensoreconomics.com/p/llm-inference-economics-fr... (long, but the TL;DR is that serving Lllama 3.3 70B costs around $0.28/million tokens input, $0.95 output at high utilization. These are close to what we see in the market: https://artificialanalysis.ai/models/llama-3-3-instruct-70b/... )


> The amount of compute in the world is doubling over 2 years because of the ongoing investment in AI (!!)

which is funded by the dumping

when the bubble pops: these DCs are turned off and left to rot, and your capacity drops by a factor of 8192


> which is funded by the dumping

What dumping do you mean?

Are you implying NVidia is selling H200s below cost?

If not then you might be interested to see that Deepseek has released there inference costs here: https://github.com/deepseek-ai/open-infra-index/blob/main/20...

If they are losing money it's because they have a free app they are subsidizing, not because the API is underpriced.


Doesn't matter now. GP can revisit the math and buy some hardware once the subscription prices actually grow too high.


You have to remember that companies are kind of fungible in the sense that founders can close old companies and start new ones to walk away from bankruptcies in the old companies. When there's a bust and a lot of companies close up shop, because data centers were overbuilt, there's going to be a lot of GPUs being sold at firesale prices - imagine chips sold at $300k today being sold for $3k tomorrow to recoup a penny on the dollar. There's going to be a business model for someone buying those chips at $3k, then offering subscription prices at little more than the cost of electricity to keep the dumped GPUs running somewhere.


I do wonder how usable the hardware will be once the creditors are trying to sell it - as far as I can tell is seems the current trend is more and more custom no-matter-the cost super expensive power-inefficient hardware.

The situation might be a lot different than people selling ex-crypto mining GPUs to gamers. There might be a lot of effective scrap that is no longer usable when it is no longer part of a some companies technological fever dream.


They will go down. Or the company will be gone.


Running an LLM locally means you never have to worry about how many tokens you've used, and also it allows for a lot of low latency interactions on smaller models that can run quickly.

I don't see why consumer hardware won't evolve to run more LLMs locally. It is a nice goal to strive for, which consumer hardware makers have been missing for a decade now. It is definitely achievable, especially if you just care about inference.


isnt this what all these NPUs are created for?


I haven’t seen an NPU that can compete with a GPU yet. Maybe for really small models, I’m still not sure where they are going with those.


> economically, it is still much better to buy a lower spec't laptop and to pay a monthly subscription for AI

Uber is economical, too; but folks prefer to own cars, sometimes multiple.

And how there's market for all kinds of vanity cars, fast sportscars, expensive supercars... I imagine PCs & Laptops will have such a market, too: In probably less than a decade, may be a £20k laptop running a 671b+ LLM locally will be the norm among pros.


> Uber is economical, too

One time I took an Uber to work because my car broke down and was in the shop and the Uber driver (somewhat pointedly) made a comment that I must be really rich to commute to work via Uber because Ubers are so expensive


Most people don't realise the amount of money they spend per year on cars.


Paying $30-$70/day to commute is economical?


if you calculate depreciation and running costs on a new car in most places - I think it probably would be.


If Uber were cheaper than the depreciation and running costs of a car, what would be left for the driver (and Uber)?


a big part of the whole "hack" of Uber in the first place is that people are using their personal vehicles. So the depreciation and many of the running costs are sunk costs already. Once you paid those already it becomes a super good deal to make money from the "free" asset you already own.


My private car provides less than one commute per day, on average.

An Uber car can provide several.


While your car in sitting in the parking lot, the uber driver is utilizing their car throughout the day.


If you’re using uber to and from work, presumably you would buy a car that’s worth more than the 10 year old Prius your uber driver has 200k miles on.


The depreciation would be amortized to cover more than one person. I only travel once or twice per week, it cost me less to use an Uber than to own a car.


> Paying $30-$70/day to commute is economical?

When LLM use approaches this number, running one locally would be, yes. What you and other commentator seem to miss is, "Uber" is a stand-in for Cloud-based LLMs: Someone else builds and owns those servers, runs the LLMs, pays the electricity bills... while its users find it "economical" to rent it.

(btw, taxis are considered economical in parts of the world where owning cars is a luxury)


any "it's cheaper to rent than to own" arguments can be (and must be) completely disregarded due to experience of the last decade

so stop it


You still need ridiculously high spec hardware, and at Apple’s prices, that isn’t cheap. Even if you can afford it (most won't), the local models you can run are still limited and they still underperform. It’s much cheaper to pay for a cloud solution and get significantly better result. In my opinion, the article is right. We need a better way to run LLMs locally.


You still need ridiculously high spec hardware, and at Apple’s prices, that isn’t cheap.

You can easily run models like Mistral and Stable Diffusion in Ollama and Draw Things, and you can run newer models like Devstral (the MLX version) and Z Image Turbo with a little effort using LM Studio and Comfyui. It isn't as fast as using a good nVidia GPU or a cloud GPU but it's certainly good enough to play around with and learn more about it. I've written a bunch of apps that give me a browser UI talking to an API that's provided by an app running a model locally and it works perfectly well. I did that on an 8GB M1 for 18 months and then upgraded to a 24GB M4 Pro recently. I still have the M1 on my network for doing AI things in the background.


You can run newer models like Z Image Turbo or FLUX.2 [dev] using Draw Things with no effort too.


I bought my M1 Max w/ 64gb of ram used. It's not that expensive.

Yes, the models it can run do not perform like chatgpt or claude 4.5, but they're still very useful.


I’m curious to hear more about how you get useful performance out of your local setup. How would you characterize the difference in “intelligence” of local models on your hardware vs. something like chatgpt? I imagine speed is also a factor. Curious to hear about your experiences in as much detail as you’re willing to share!


Local models won't generally have as much context window, and the quantization process does make them "dumber" for lack of a better word.

If you try to get them to compose text, you'll end up seeing a lot less variety than you would with a chatgpt for instance. That said, ask them to analyze a csv file that you don't want to give to chatgpt, or ask them to write code and they're generally competent at it. the high end codex-gpt-5.2 type models are smarter, may find better solutions, may track down bugs more quickly -- but the local models are getting better all the time.


749 for an M4 air at Amazon right now


Try running anything interesting on these 8gb of ram.

You need 96gb or 128gb to do non trivial things. That is not yet 749 usd


Fair enough, but they start at 16GB nowadays.


The M4 starts with 16GB, though that can also be tight for local LLMs. You can get one with 24GB for $1149 right now though, which is good value.


899 at B&H started today 12/24


64gb is fine.


This subthread is about the Macbook Air, which tops out at 32 GB, and can't be upgraded further.

While browsing the Apple website, it looks like the cheapest Macbook with 64 GB of RAM is the Macbook Pro M4 Max with 40-core GPU, which starts at $3,899, a.k.a. more than five times more expensive than the price quoted above.


I have an M1 Max w/ 64gb that cost me much less than that -- you don't have to buy the latest model brand new.


if you are going for 64GB, you need at least a Max CPU or you will be bandwidth/GPU limited.


I was pleasantly surprised at the speed and power of my second hand M1 Pro 32GB running Asahi & Qwen3:32B. It does all I need, and I dont mind the reading pace output, although I'd be tempted by M2 Ultra if the secondhand market hadn't also exploded with the recent RAM market manipulations.

Anyway, I'm on a mission to have no subscriptions in the New Year. Plus it feels wrong to be contributing towards my own irrelevance (GAI).


Yeah, any Mac system specced with a decent amount of RAM since the M1 will run LLMs locally very well. And that’s exactly how the built-in Apple Intelligence service works: when enabled, it downloads a smallish local model. Since all Macs since the M1 have very fast memory available to the integrated GPU, they’re very good at AI.

The article kinda sucks at explaining how NPUs aren’t really even needed, they just have potential to make things more efficient in the future rather than depending on the power consumption involved with running your GPU.


This article specifically talks about PC laptops and discusses changes in them.


Only if you want to take all the proprietary baggage and telemetry that comes with Apple platforms by default.

A Lenovo T15g with a 16gb 3080 mobile doesn’t do too badly and will run more than just Windows.


I just got a Framework desktop with 128 GB of shared RAM just before the memory prices rocketed, and I can comfortably run many even bigger oss models locally. You can dedicate 112GB to the GPU and it runs Linux perfectly.


The M-series chips really changed the game here


This article is to sell more laptops.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: