m building a wrapper that queries GPT-4, Claude, and Gemini, then executes their code in a sandbox to catch hallucinations.
Is the latency (30s) worth the certainty? Or do you prefer speed?
I'm running manual tests for people today if anyone wants to try it.