LLM fine tuning tends to destroy the model's capabilities if you aren't very car...

nxobject · 2025-07-30T11:14:19 1753874059

My very cursory understanding -- at least from Unsloth's recommendations -- is that you have to work very hard to preserve reasoning/instruct capabilities [1]: for example to "preserve" Qwen3's reasoning capabilities (however that's operationalized), they suggest a fine-tuning corpus that's 75% chain of thought to 25% non-reasoning. Is that a significant issue for orgs/projects that currently rely on fine-tuning?

[1] https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tun...

israrkhan · 2025-07-29T23:13:51 1753830831

do you have a suggestion or a way to measure if model capabilities are getting destroyed? how do one measure it objectively?

mensetmanusman · 2025-07-30T12:14:19 1753877659

These are now questions at the cutting edge of academic research. It might be computationally unknowable until checked.

RALaBarge · 2025-07-30T00:05:27 1753833927

Ask it a series of the same questions after you train that you posed before training started. Is the quality lower?

israrkhan · 2025-07-30T06:38:03 1753857483

That series of questions will measure only a particular area. I am concerned about destorying model capabilities in some other area that that I do not pay attention to, and have no way of knowing.

simonh · 2025-07-30T07:26:35 1753860395

Isn’t that a general problem with LLMs? The only way to know how good it is at something is to test it.