Run Llama 2 on your own Mac using LLM and Homebrew

samwillis · on Aug 1, 2023

I'm really excited by the concept of local LLMs, we give to much of our data to the cloud. We should embrace local first principles with all these new AI tools.

Sure local inference if harder than in the cloud, but hardware is getting better all the time, and we are still early on the optimisation curve when it comes to LLMs.

I'm looking forward to hopefully seeing smaller less resource intensive models that are easer to run locally, even on mobile.

Does anyone who of any research info "trimming" or "stripping" less used parts of a LLM so that you can take a trained weights and make them smaller (obviously with loss of some sort)?

sourabhv · on Aug 2, 2023

The 13B link seems to be incorrect. Correct one would be

https://huggingface.co/TheBloke/Llama-2-13B-GGML/resolve/mai...

version_five · on Aug 1, 2023

What is the advantage of this vs running directly in llama.cpp?

Also interesting observation about the prompts, I noticed too that I got much better performance when I didn't use Facebook's standard prompt.

simonw · on Aug 1, 2023

This particular example is built on top of llama.cpp. It has a few benefits:

1. It's hopefully easier to install (though still not nearly easy enough)

2. All prompts you send through it - along with their responses - are automatically logged to a SQLite database. This is fantastic for running experiments and figuring out what kinds of things work.

3. The same LLM tool works for other models as well - you can run "llm -m $MODEL $PROMPT" against OpenAI models, Anthropic models, other self-hosted models, models hosted on Replicate - all handled by plugins, which should make it really easy to add support for other models too.

My ultimate goal with LLM is that when someone releases a new model it will quickly be supported by an LLM plugin, which should make it MUCH easier to install and run these things without having to figure out a brand new way of doing it every single time.

More on the project here: https://llm.datasette.io/

And a lot more about how the (new) plugin system works here: https://simonwillison.net/2023/Jul/12/llm/

2c2c2c · on Aug 1, 2023

any tips on how to go about fine tuning a model?

I would like to test performance on a particular task. I can probably get around 100 hand trained example prompts -> ideal response

Not sure if this is enough to warrant retraining on top of the base model. or if I should just throw a few examples into the prompt itself.

Also unsure which source download I should start with if I intend to do retraining

simonw · on Aug 1, 2023

I haven't tried fune-tuning a model myself yet. There are a few good guides floating around - here's a recent discussion on HN https://news.ycombinator.com/item?id=36852971