w2c2 has only 2 mentions. wasm2c is not a clear winner, it's specifically losing several of their benchmarks.
In general, using a preexisting compiler as a JIT backend is an old hack, there's nothing new there. It's just another JIT/AoT backend. For example, databases have done query compilation for probably decades by now.
> Any idea on why the other end of the spectrum is this way -- thinking that it always has something to do?
Who said anything about "thinking"? Smaller models were notorious for getting stuck repeating a single word over and over, or just "eeeeeee" forever. Larger models only change probabilities, not the fundamental nature of the machine.
Claude Code is more than the TUI, it's the prompts, the agentic loop, and tools, all made to cooperate well with the LLM powering it. If you use Claude Code over a longer period of time you'll notice Anthropic changing the tooling and prompts underneath it to make it work better. By now, the model is tuned to their prompts, tools etc.
I don't think there is a difference. They can need or want or demand and it doesn't matter. They don't have the right to weaponize my computer against me to fulfill their goals.
In this case, you verify whether the knowledge was made up by comparing the virtual waiter behaviour to the actual waiter. Having a strong test suite like that is actually the ideal scenario for agentic development.
(It still incredibly hard to pull off for real, because of complex stateful protocols and edge cases around timing and transfer sizes. Samba did take 12 years to develop, so even with LLM help you'd probably still be looking at several years.)
In general, using a preexisting compiler as a JIT backend is an old hack, there's nothing new there. It's just another JIT/AoT backend. For example, databases have done query compilation for probably decades by now.
reply