Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LLM fine tuning tends to destroy the model's capabilities if you aren't very careful. It's not as easy or effective as with image generation.


My very cursory understanding -- at least from Unsloth's recommendations -- is that you have to work very hard to preserve reasoning/instruct capabilities [1]: for example to "preserve" Qwen3's reasoning capabilities (however that's operationalized), they suggest a fine-tuning corpus that's 75% chain of thought to 25% non-reasoning. Is that a significant issue for orgs/projects that currently rely on fine-tuning?

[1] https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tun...


do you have a suggestion or a way to measure if model capabilities are getting destroyed? how do one measure it objectively?


These are now questions at the cutting edge of academic research. It might be computationally unknowable until checked.


Ask it a series of the same questions after you train that you posed before training started. Is the quality lower?


That series of questions will measure only a particular area. I am concerned about destorying model capabilities in some other area that that I do not pay attention to, and have no way of knowing.


Isn’t that a general problem with LLMs? The only way to know how good it is at something is to test it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: