Been using ggerganov’s llama vscode plugin with the smaller 2.5 models and it ac...

kimsia · 2025-07-26T12:35:18 1753533318

I'm on a m1 max with 64gb ram, but i never use this vscode plugin before. Should I try?

Is this the one? https://github.com/ggml-org/llama.vscode it sems to be built for code completion rather than outright agent mode

larodi · 2025-07-29T07:00:33 1753772433

It is RAG for your codebase, and provides code completion. The gain is the local inference, and is actually useful with smaller models.

The plugin itself provides chat also, but my gut feeling is that ggerganov runs several models at the some time, given he uses a 192gb machine.

Have not tried this scenario yet, but looking at my API bill I’m probably going to try 100% local dev at some point. Besides vibe coding with existing tools seems to not work that good for enterprise size codebases.

pxc · 2025-07-23T16:20:15 1753287615

What languages do you work in? How much code do you keep? Do you end up using it as scaffolding and rewriting it, it leaving most of it as is?

larodi · 2025-07-25T19:45:30 1753472730

Languages: JS/TS, C/C++, Shader Code, Some ESP Arduino code. Not counting all the boilerplate and CSS that I dont care about too much.

It very much reminds of tabbing autocomplete with IntelliSense step by step, but in a more diffusion-like way.

but my tool-set is a mixture of agentic and autocomplete, not 100% of each. I try to keep a clear focus of the architecture, and actually own the code by reading most of it, keeping straight the parts of the code the way i like.