I just went through an eerily similar situation where the coding agent was able to muster some pretty advanced math (information geometry) to solve my problem at hand.
But while I was able to understand it enough to steer the conversation, I was utterly unable to make any meaningful change to the code or grasp what it was doing. Unfortunately, unlike in the case you described, chatting with the LLM didn’t cut it as the domain is challenging enough. I’m on a rabbit hunt now for days, picking up the math foundations and writing the code at a slower pace albeit one I can keep up with.
And to be honest it’s incredibly fun. Applied math with a smart, dedicated tutor and the ability to immediately see results and build your intuition is miles ahead of my memories back in formative years.
Do you have benchmarks for the SGLang vs vLLM latency and throughput question? Not to challenge your point, but I’d like to reproduce these results and fiddle with the configs a bit, also on different models & hardware combos.
Except this is GLM 4.7 Flash which has 32B total params, 3B active. It should fit with a decent context window of 40k or so in 20GB of ram at 4b weights quantization and you can save even more by quantizing the activations and KV cache to 8bit.
yes, but the parrent link was to the big glm 4.7 that had a bunch of ggufs, the new one at the point of posting did not, nor does it now. im waiting for unsloth guys for the 4.7 flash
Also worth noticing the size of these countries. Mostly on the small/tiny side, besides Norway (an oil exporter) and Ireland (a corporate tax haven)
Perhaps making good economic decisions grows exponentially in difficulty with the population size, especially for “conventional” economies that do not have another cash cow.
The reality is to you get way more benefit from attracting a multinational company to shift its revenue to you for tax purposes when you're a (very) small state. This strategy simply isn't available to you when you're the US (which already has major bases for all these companies, it's just not that big a deal compared with their population and govt budget)
Do you have any evals on how good LLMs are at generating Glyphlang?
I’m curious if you optimized for the ability to generate functioning code or just tokenization compression rate, which LLMs you tokenized for, and what was your optimization process like.
I'd rather view it as a failure to distinguish between data and logic. The status-quo is several steps short of where it needs to be before we can productively start talking about types and completeness.
Unfortunately that, er, opportunistic shortcut is an essential behavior of modern LLMs, and everybody keeps building around it hoping the root problem will be fixed by some silver-bullet further down the line.
reply