Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Generally, you train it again entirely from scratch.

It's possible to introduce new information by fine-tuning a new model on top of the existing model, but it's debatable how effective that is for introducing new information - most fine-tuning success stories I've seen focus on teaching a model how to perform new kinds of task as opposed to teaching it new "facts".

If you want a model to have access to updated information, the best way to do that is still via Retrieval Augmented Generation. That's the fancy name for the trick where you give the model the ability to run searches for information relevant to the user's questions and then invisibly paste that content into the prompt - effectively what Bing, Bard and ChatGPT do when they run searches against their attached search engines.



Most LoRAs are less effective for facts since changes largely shift attention (Q and K, not V layers) and only touch a tiny percentage of weights at that, however full fine-tunes on models are pretty effective for introducing new facts (you could probably use ReLoRA as well).

There are also newer techniques like or ROME that could edit individual facts, and you might also be able to get there when you are updating by doing a DPO tune of the old vs the new answers as well.

While I agree that RAG/tool use (with consistency checking) might be overall best approach for facts, being able to update/tune for model drift is probably going to still be important.

I'd also disagree about the training entirely from scratch - unless you're changing architecture/building a brand new foundational model or have unlimited time/compute budget, that seems like the worst option (and pretty unrealistic) for most people.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: