I get that I can run local models, but all the paid for (remote) models are superior.
So is the use-case just for people who don’t want to use big tech’s models? Is this just for privacy conscious people? Or is this just for “adult” chats, ie porn bots?
Not being cynical here, just wanting to understand the genuine reasons people are using it.
Yes, frontier models from the labs are a step ahead and likely will always be, but we've already crossed levels of "good enough for X" with local models. This is analogous to the fact that my iPhone 17 is technically superior to my iPhone 8, but my outcomes for text messaging are no better.
I've invested heavily in local inference. For me, it's a mixture privacy, control, stability, cognitive security.
Privacy - my agents can work on tax docs, personal letters, etc.
Control - I do inference steering with some projects: constraining which token can be generated next at any point in time. Not possible with API endpoints.
Stability - I had many bad experiences with frontier labs' inference quality shifting within the same day, likely due to quantization due to system load. Worse, they retire models, update their own system prompts, etc. They're not stable.
Cognitive Security - This has become more important as I rely more on my agents for performing administrative work. This is intermixed with the Control/Stability concerns, but the focus is on whether I can trust it to do what I intended it to do, and that it's acting on my instructions, rather than the labs'.
I just "invested heavily" (relatively modest, but heavy for me) in a PC for local inference. The RAM was painful. Anyway, for my focused programming tasks the 30B models are plenty good enough.
> I get that I can run local models, but all the paid for (remote) models are superior.
If that's clearly true for your use cases, then maybe this isn’t for you.
> So is the use-case just for people who don’t want to use big tech’s models?
Most weights available models are also “big tech’s”, or finetunes of them.
> Is this just for privacy conscious people? Or is this just for “adult” chats, ie porn bots?
Sure, those are among the use cases. And there can be very good reasons to be concerned about privacy in some applications. But they aren’t the only reasons.
There’s a diversity of weights-available models available, with a variety of specialized strengths. Sure, for general use, the big commercial models may generally be more capable, but they may not be optimal for all uses (especially when cost effectiveness is considered, given that capable weights-available models for some uses are very lightweight.)
For some projects, you do not want your code or documents leaving the LAN. Many companies have explicit constraints on using external SaaS. It does not mean they restrict to everything 'on prem'. 'Self hosted' can include running an open weights model on multiple rented B200's.
So yes, the tradeoff is security vs capability. The former always comes at a cost.
Yeah, it’s not going to compare to Codex-5.2 or Opus 4.5.
Some non-programming use cases are interesting though, e.g. text to speech or speech to text.
Run a TTS model overnight on a book, and in the morning you’ll get an audiobook. With a simple approach, you’d get something more like the old books on tape (e.g. no chapter skipping), but regardless, it’s a valid use case.
Reports of people getting hit by twitchy fingered banbots on cloud LLMs are starting to show up(Gemini bans apparently kill Gmail and GDrive too). Paranoid types like I am appreciate local options that won't get me banned.
There are some surprisingly useful "small" use cases for general-purpose LLMs that don't necessarily require broad knowledge – image transcription plus some light post-processing is one I use a lot.
I've gotten interested in local models recently after trying the here and there for years. We've finally hit the point where small <24GB models are capable of pretty amazing things. One use I have is I have a scraped forum database, and with a 20gb devstral model I was able to get it to select a bunch of random posts related to a species of exotic plants in batches of 5-10 up to n, summarize them into and intern sqllite table, then at the end go through read the interim summarization and write a final document addressing 5 different topics related to users experience growing the species.
Thats what convinced me they are ready to do real work, are they going to replace claude code...not currently. But it is insane to me that such a small model can follow those explicit directions and consistently perform that workflow.
I've during that experimentation, even when not putting the sql explicit it was able to craft the queries on its own from just text description, and has no issue navigating the cli and file system doing basic day to day things.
I'm sure there are a lot of people doing "adult" things, but my interest is sparked because they finally at the level they can be a tool in a homelab, and no longer is llm usage limits subsidized like they used to be. Not to mention I am really disillusioned with big tech having my data or exposing a tool making API calls to them that then can make actions on my system.
I'll still keep using claude code day to day coding. But for small system based tasks I plan on moving to local llms. Their capabilities have inspired me to write my own agentic framework to see what work flows can be put together for just management and automation of day to day task. Ideally it would be nice to just chat with an llm and tell it to add an appointment or call at x time or make sure I do it that day and it can read my schedule and remind-me at a chill time of my day to make the call, and then check up that I followed through. I also plan on seeing if I can also set it up to remind me and help to practice mindfulness and just general stress management I should do. While sure a simple reminder might work, but as someone with adhd who easily forgets reminders as soon as they pop up if I can get to them now, being pestered by an agent that wakes up and engages with me seems like it might be an interesting workflow.
And the hacker aspect, now that they are capable I really want to mess around with persistent knowledge in databases and making them intercommunicate and work together. Might even give them access to rewrite themselves and access the application during run time with a lisp. But to me local llms have gotten to the point they are fun and not annoying. I can run a model that is better than chatgpt 3.5 for the most part, its knowledge is more distilled and narrower, but for what they do understand their correctness is much better.
To justify investing a trillion dollars like everything else LLM-related. The local models are pretty good. Like I ran a test on R1 (the smallest version) vs Perplexity Pro and shockingly got better answers running on base spec Mac Mini M4. It's simply not true that there is a huge difference. Mostly it's hardcoded overoptimalization. In general these models aren't really becoming better.
So long as the local model supports tool-use, I haven't had issues with them using web search etc in open-webui. Frontier models will just be smarter in knowing when to use tools.
> For me the main BIG deal is that cloud models have online search embedded etc, while this one doesn't.
Models do not have online search embedded, they have tool use capabilities (possibly with specialized training for a web search tool), but that's true of many open and weights-available models, and they are run with harnesses that support tools and provide a web search tool (lmstudio is such a harness, and can easily be supplied with a web search tool.)
Also, I had several experiments where I was interested in just 5 to 10 websites with application specific information so it works nicely for fast dev to spider, keep a local index, then get very low search latency. Obviously this is not a general solution but is nice for some use cases.
That doesn’t address the practical significance of privacy, though. The real risk isn’t that OpenAI employees will read your chats for personal amusement. The risk is that OpenAI will exploit the secrets you’ve entrusted to them, to manipulate you, or to enable others to manipulate you.
The more information an unscrupulous actor has about you, the more damage they can do.
currently working on a personal project where part of the pipeline is recognizing lots of images. the employer let me use gemini for personal use, but wasting large amount of tokens on gemini3 pro ocr limited my work. flash gives worse result, but there are ways to retry. good for development, but long term, simpler parts of a pipeline could be dedicated to a local model. I can imagine many other use cases where you want large volume of low difficulty tasks at close to zero cost.
I run a separate memory layer between my local and my chat.
Without a ton of hassle I cannot do that with a public model(without paying API pricing).
My responses may be slower, but I know the historical context is going to be there. As well as the model overrides.
In addition I can bolt on modules as I feel like it(voice, avatar, silly tavern to list a few).
I get to control my model by selecting specific ones for tasks, I can upgrade as they are released.
These are the reasons I use local.
I do use Claude for a coding junior so I can assign tasks and review it, purely because I do not have something that can replicate that locally on my setup(hardware wise, but from what I have read local coding models are not matching Claude yet)
That's more than likely a temporary issue(years not weeks with the expensive of things and state of open models specialising in coding).
TL;DR: The classic CIA triad: Confidentiality, Integrity, Availability; cost/price concerns; the leading open-weight models aren't nearly as bad as you might think.
You don't need LM Studio to run local models, it just (was, formerly), a nice UI to download and manage HF models and llama.cpp updates, quickly and easily manually switch between CPU / Vulkan / ROCm / CUDA (depending on your platform).
Regarding your actual question, there are several reasons.
First off, your allusion to privacy - absolutely, yes, some people use it for adult role-play, however, consider the more productive motivations for privacy, too: a lot of businesses with trade secrets they may want to discuss or work on with local models without ever releasing that information to cloud providers, no matter how much those cloud providers pinky promise to never peek at it. Google, Microsoft, Meta, et al have consistently demonstrated that they do not value or respect customer privacy expectations, that they will eagerly comply with illegal, unconstitutional NSA conspiracies to facilitate bulk collection of customer information / data. There is no reason to believe Anthropic, OpenAI, Google, xAI would act any differently today. In fact, there is already a standing court order forcing OpenAI to preserve all customer communications, in a format that can be delivered to the court (i.e. plaintext, or encryption at rest + willing to provide decryption keys to the court), in perpetuity (https://techstartups.com/2025/06/06/court-orders-openai-to-p...)
There are also businesses which have strict, absolute needs for 24/7 availability and low latency, which remote APIs never have offered. Even if the remote APIs were flawless, and even if the businesses have a robust multi-WAN setup with redundant UPS systems, network downtime or even routing issues are more or less an inevitable fact of life, sooner or later. Having local models means you have inference capability as long as you have electricity.
Consider, too, the integrity front: frontier labs may silently modify API-served models to be lower quality for heavy users with little means of detection by end users (multiple labs have been suspected / accused of this; a lack of proof isn't evidence that it didn't happen) or that the API-served models can be modified over time to patch behaviors that may have been previously relied upon for legitimate workloads (imagine a red team that used a jailbreak to get a model to produce code for process hollowing, for instance). This second example absolutely has happened with almost every inference provider.
The open weight local models also have zero marginal cost besides electricity once the hardware is present, unlike PAYG API models, which create financial lock-in and dependency that is in direct contrast with the financial interests of the customers. You can argue about the amortized costs of hardware, but that's a decision for the customer to make using their specific and personal financial and capex / hardware information that you don't have at the end of the day.
Further, the gap between frontier open weight models and frontier proprietary models has been rapidly shrinking and continues to. See Kimi K2.5, Xiaomi MiMo v2, GLM 4.7, etc. Yes, Opus 4.5, Gemini 3 Pro, GPT-5.2-xhigh are remarkably good models and may beat these at the margin, but most work done via LLMs does not need the absolute best model; many people will opt for a model that gets 95% of the output quality of the absolute frontier model when it can be had for 1/20th the cost (or less).
ChatGPT, please summarise this long essay by Stephen Wolfram into a couple of pithy sentences:
TLDR: AI won’t “end work” so much as endlessly move the goalposts, because the universe itself is too computationally messy to automate completely. The real risk isn’t mass unemployment—it’s that we’ll have infinite machine intelligence and still argue about what’s worth doing.
There are infinite things worth doing, a machines ability to actually know what's worth doing in any given scenario is likely on par with a human's. What's "Worth doing" is subjective, everything comes down to situational context. Machines cannot escape the same ambiguity as humans. If context is constant, then I would assume overlapping performance on a pretty standard distribution between humans and machines.
Machines lower the marginal cost of performing a cognitive task for humans, it can be extremely useful and high leverage to off load certain decisions to machines. I think it's reasonable to ask a machine to decide when machine context is higher and outcome is de-risked.
Human leverage of AGI comes down to good judgement, but that too is not uniformly applied.
For what human leverage of AGI may look like, look at the relationship between a mother and a toddler.
As you said: There's an infinite number of things a toddler may find worth doing, and they offload most of the execution to the mother. The mother doesn't escape the ambiguity, but has more experience and context.
Of course, this all assumes AGI is coming and super intelligent.
Well, because people are lazy. They already ask it for advice and it gives answers that they like. I already see teams using AI to put together development plans.
If you assume super intelligence, Why wouldn't that expand? Especially when it comes to competitive decisions that have a real cost when they're suboptimal?
The end state is that agents will do almost all of the real decision making, assuming things work out as the AI proponents say.
It’s vibe coded slop that could be made by anyone with Claude Code and a spare weekend.
It didn’t require any skill, it’s all written by Claude. I’m not sure why you’re trying to hype up this guy, if he didn’t have Claude he couldn’t have made this, just like non engineers all over the world are coding all a variety of shit right now.
I’ve been following Peter and his projects 7-8 months now and you fundamentally mischaracterize him.
Peter was a successful developer prior to this and an incredibly nice guy to boot, so I feel the need to defend him from anonymous hate like this.
What is particularly impressive about Peter is his throughput of publishing *usable utility software*. Over the last year he’s released a couple dozen projects, many of which have seen moderate adoption.
I don’t use the bot, but I do use several of his tools and have also contributed to them.
There is a place in this world for both serious, well-crafted software as well as lower-stakes slop. You don’t have to love the slop, but you would do well to understand that there are people optimizing these pipelines and they will continue to get better.
Weekend - certainly not, the scope is massive. All those CLIs - gmail, whisper, elevenlabs, whatsapp/telegram/discord/etc, obsidian, generic skills marketplace etc, it's just so many separate APIs to build against.
But Peter just said in his TBPN interview that you can likely re-build all that in 1 month. Maybe you'd need to work 14h per day like he does, and running 10 codex sessions in parallel, using 4-6 OpenAI Pro subs.
Because no one cares about optimizing for this because it's a stupid benchmark.
It doesn't mean anything. No frontier lab is trying hard to improve the way its model produces SVG format files.
I would also add, the frontier labs are spending all their post-training time on working on the shit that is actually making them money: i.e. writing code and improving tool calling.
The Pelican on a bicycle thing is funny, yes, but it doesn't really translate into more revenue for AI labs so there's a reason it's not radically improving over time.
Why stupid? Vector images are widely used and extremely useful directly and to render raster images at different scales. It’s also highly connected with spacial and geometric reasoning and precision, which would open up a whole new class of problems these models could tackle. Sure, it’s secondary to raster image analysis and generation, but curious why it would be stupid to persue?
It shows that these are nowhere near anything resembling human intelligence. You wouldn't have to optimize for anything if it would be a general intelligence of sorts.
So you think if would give a pencil and a paper to the model would it do better?
I don't think SVG is the problem. It just shows that models are fragile (nothing new) so even if they can (probably) make a good PNG with a pelican on a bike, and they can make (probably) make some good SVG, they do not "transfer" things because they do not "understand them".
I do expect models to fail randomly in tasks that are not "average and common" so for me personally the benchmark is not very useful (and that does not mean they can't work, just that I would not bet on it). If there are people that think "if an LLM outputted an SVG for my request it means it can output an SVG for every image", there might be some value.
This exactly. I don't understand the argument that seems to be, if it were real intelligence, it would never have to learn anything. It's machine learning, not machine magic.
One aspect worth considering is that, given a human who knows HTML and graphics coding but who had never heard of SVG, they could be expected to perform such a task (eventually) if given a chance to train on SVG from the spec.
Current-gen LLMs might be able to do that with in-context learning, but if limited to pretraining alone, or even pretraining followed by post-training, would one book be enough to impart genuine SVG composition and interpretation skills to the model weights themselves?
My understanding is that the answer would be no, a single copy of the SVG spec would not be anywhere near enough to make the resulting base model any good at SVG authorship. Quite a few other examples and references would be needed in either pretraining, post-training or both.
So one measure of AGI -- necessary but not sufficient on its own -- might be the ability to gain knowledge and skills with no more exposure to training material than a human student would be given. We shouldn't have to feed it terabytes of highly-redundant training material, as we do now, and spend hundreds of GWh to make it stick. Of course that could change by 5 PM today, the way things are going...
I suspect there is actually quite a bit of money on the table here. For those of us running print-on-demand workflows, the current raster-to-vector pipeline is incredibly brittle and expensive to maintain. Reliable native SVG generation would solve a massive architectural headache for physical product creation.
How is he "well respected", based on what metric? Amount of vibe coded slop put out into the ecosystem?
He sounds like someone who has just vibe coded shit until something stuck to the wall. I also find it hard to respect people who create things which are 99-100% coded by an LLM, with zero technical merit or skill. Again, just creating slop until something goes viral.
As far as I can see Clawdbot is just more AI-slop. Anyone can create the same thing (and many have created similar) over a weekend. It's riddled with bugs, security holes, and it's a disaster waiting to happen basically.
Just the opposite, he has over 15 years of experience of providing third party frameworks for the iOS community, used in thousands of apps. He founded PSPDFKit, a library for working with PDFs and managed to make an exit of the company worth $100 million
He's written up hundreds of articles on different topics in the community and is very much a skilled developer, with tons of technical merit.
Now you come along with your small mind and a hard on for AI-hate and all you can comprehend is that nothing can challenge your world view so you reach out and attack what you don't understand. That just defines you as ignorant.
It did destroy memory though. I would bet any amount of money that our memories in 2026 are far, far worse than they were in 1950 or 1900.
In fact, I can feel my memory is easily worse now than from before ChatGPT's release, because we are doing less hard cognitive work. The less we use our brain's the dumber we get, and we are definitely using our brains less now.
It's not writing that destroys memory. It's fast/low-cost lookup of written material that destroys memory. This is why people had strong memory despite hundreds of years of widespread writing, and it suddenly fell through the floor with the introduction of widespread computers, internet, and smartphones.
we existing in a stunningly more abstract and complex society than we did even 100 years. Unless you are reasonably intelligent its incredibly difficult to even navigate the modern world.
reply