Hacker Newsnew | past | comments | ask | show | jobs | submit | jimmaswell's commentslogin

What are the specs, and how does it compare to Copilot or GPT Codex?

You can check out https://www.reddit.com/r/LocalLLaMA/comments/1piq11p/mac_wit... for a sentiment of usefulness and the specs of the machines running it. It will be some variation of Max or Ultra level Apple silicon, and around 64GB or more RAM. Oh, and an HN submission from 9 months ago: https://news.ycombinator.com/item?id=43856489

Copilot comparison:

Intelligence: Qwen2.5-Coder-32B is widely considered the first open-source model to reach GPT-4o and Claude 3.5 Sonnet levels of coding proficiency. While Copilot (using GPT-4o) remains highly reliable, Qwen often produces more concise code and can outperform cloud models in specific tasks like code repair.

Latency: Local execution on an M3 Max provides near-zero network latency, resulting in faster "start-to-type" responses than Copilot, which must round-trip to the cloud.

Reliability: Copilot is an all-in-one "vibe" that integrates deeply into VS Code. Qwen requires local tools like Ollama or MLX-LM and a plugin like Continue.dev to achieve the same UX.

GPT-Codex:

Intelligence & Reasoning: In recent 2025–2026 benchmarks, the Qwen3-Coder series has emerged as the strongest open-source performer, matching the "pass@5" resolution rates of flagship models like GPT-5-High. While OpenAI’s latest GPT-5.1-Codex-Max remains the overall leader in complex, project-wide autonomous engineering, Qwen is frequently cited as the better choice for local, file-specific logic.

Architecture & Efficiency: OpenAI models like GPT-OSS-20b (a Mixture-of-Experts model) are optimized for extreme speed and tool-calling. However, the M3 Max with 64GB is powerful enough to run the Qwen3-Coder-30B or 32B models at full fidelity, which provides superior logic to OpenAI's smaller "mini" or "OSS" models.

Context Window: Qwen models offer substantial context (up to 128K–256K tokens), which is comparable to OpenAI’s specialized Codex variants. This allows you to process entire modules locally without the high per-token cost of sending that data to OpenAI's servers.


Same here re: ADHD. It's been invaluable. A big project that would have been personally intractible is now easy - even if the LLM gives slightly wrong answers 20% of the time, the important thing is that it collapses the search space for what concepts or tools I need to look into and gives an overall structure to iterate on. I tend to use ChatGPT for the big planning/architectural conversation, and I find it's also very good at sample code; for code writing/editing, Copilot has been fantastic too, lately mostly using the Opus agent in my case. It's so nice being able to delegate some bullshit gruntwork to it while I either do something else or work on architecture in another window for a few minutes.

It certainly hasn't inhibited learning either. The most recent example is shaders. I started by having it just generate entire shaders based on descriptions, without really understanding the pipeline fully, and asking how to apply them in Unity. I've been generally familiar with Unity for over a decade but never really touched materials or shaders. The generated shaders were shockingly good and did what I asked, but over time I wanted to really fine tune some of the behavior and wound up with multiple passes, compute shaders, and a bunch of other cool stuff - and understanding it all on a deeper level as a result.


The law is written such that they could do all that to a small family business that forgot to delete their Apache logs, which isn't good and leaves room for abuse even if they pinkie swear it's only meant for big violations.


Only after informing you, giving you the opportunity to fix things and many many other steps. The harshness is directly related to the size of the company and the companies willingness to fix any issues. They want companies to comply.


Reading the words and interpreting the law in its wider legal context are two different things


As a web dev, Safari is like the new IE6 - it does everything slightly wrong and I have to sprinkle my code with special cases for it because too many people use it to ignore. This modal scrolls properly in Firefox and Chrome? Not Safari, better add a million extra css attributes and maybe even some JS for fun to deal with it. This CSS parses exactly the same in Firefox and Chrome? Not Safari, they decided to Think Different. My workplace's frontend codebase is absolutely polluted with /* Safari fix: ... */

https://www.google.com/search?q=examples%20of%20code%20that%...


> You ask it the right niche thing and it can only pump out a few variations of the same code, that's clearly someone else's code stolen almost verbatim.

There are only so many ways to express the same idea. Even clean room engineers write incidentally identical code to the source sometimes.


There was an example on here recently where an AI PR to an open source literally had someone else's name in the comments in the code, and included their license.

That's the level of tell-tale that's its just stealing code and modifying a couple of variable names.

For me personally, the code I've seen might be written in a slightly weird style, or have strange, not applicable to the question, additions.

They're so obviously not "clean room" code or incredibly generic, they're the opposite, they're incredibly specific.


For me the excitement is palpable when I've asked it to write a feature, then I go test it and it entirely works as expected. It's so cool.


I've asked ChatGPT "Could X thing in quantum mechanics actually be caused by/an expression of the same thing going on as Y" where it had prime opportunity to say I'm a genius discovering something profound, but instead it just went into some very technical specifics about why they weren't really the same or related. IME 5 has been a big improvement in being more objective.


> many companies changed their "strategy" to mandating AI usage internally

Are they hiring? My job is still dragging its feet on approving copilot.


It demonstrated the capabilities of an AI to a potentially on-the-fence audience while giving the author experience using the new tools/environment. That's solid value. I also just find it really cool to see that an AI did this.


Yeah, it shows the AI is not capable of writing maintainable projects. I'm off the fence. And its cool you find it cool, but reducing the problem space to that of a toy project makes it so much less impressive as to be trivially ignorable.

The new LLM (pattern recognizer/matcher) is not a good tool


> actually verify that the code I just wrote does what I intended it to

That's what the author did when they ran it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: