Ran gpt-oss:20b on a RTX 3090 24 gb vram through ollama, here's my experience:
Basic ollama calling through a post endpoint works fine. However, the structured output doesn't work. The model is insanely fast and good in reasoning.
In combination with Cline it appears to be worthless. Tools calling doesn't work ( they say it does), fails to wait for feedback ( or correctly call ask_followup_question ) and above 18k in context, it runs partially in cpu ( weird), since they claim it should work comfortably on a 16 gb vram rtx.
> Unexpected API Response: The language model did not provide any assistant messages. This may indicate an issue with the API or the model's output.
Edit: Also doesn't work with the openai compatible provider in cline. There it doesn't detect the prompt.
Basic ollama calling through a post endpoint works fine. However, the structured output doesn't work. The model is insanely fast and good in reasoning.
In combination with Cline it appears to be worthless. Tools calling doesn't work ( they say it does), fails to wait for feedback ( or correctly call ask_followup_question ) and above 18k in context, it runs partially in cpu ( weird), since they claim it should work comfortably on a 16 gb vram rtx.
> Unexpected API Response: The language model did not provide any assistant messages. This may indicate an issue with the API or the model's output.
Edit: Also doesn't work with the openai compatible provider in cline. There it doesn't detect the prompt.