My very cursory understanding -- at least from Unsloth's recommendations -- is that you have to work very hard to preserve reasoning/instruct capabilities [1]: for example to "preserve" Qwen3's reasoning capabilities (however that's operationalized), they suggest a fine-tuning corpus that's 75% chain of thought to 25% non-reasoning. Is that a significant issue for orgs/projects that currently rely on fine-tuning?
That series of questions will measure only a particular area. I am concerned about destorying model capabilities in some other area that that I do not pay attention to, and have no way of knowing.