“ When we write, we share the way that we think. When we read, we get a glimpse of another mind. But when an LLM is the author, there is no mind there for a reader to glimpse.” — I dunno, I feel like reading is more a glimpse into how I think than how the author thinks…a generated story can be just as moving as one from a human, I think.
Fascinating. The article repeatedly makes the claim that “LLMs work by predicting likely next words in a string of text”. Yet there’s the seemingly contradictory implication that we don’t know how LLMs work (ie we don’t know their secret sauce). How does one reconcile this? They’re either fancy autocompletes, or magic autocompletes (in which case the magic qualifier seems more important in understanding what they are than the autocomplete part).
This occurs because of ambiguous language which conflates the LLM algorithm with the training-data and the derived weights.
The mysterious part involves whatever patterns might naturally exist within bazillions of human documents, and what partial/compressed patterns might exist within the weights the LLM generates (on training) and then later uses.
Analogy: We built a probe that travels to an alien planet, mines out crystal deposits, and projects light through those fragments to show unexpected pictures of the planet's past. We know exactly how our part of the machine works, and we know the chemical composition of the crystals, but...
These system work by taking a list of tokens (basically words) and the "model" and send that to a function which returns one new token.
You add that new token to the list of tokens.
Repeat with the new list of tokens.
That's how these systems work. We don't know exactly how the model works (wrt input tokens) but even that is a simplification. It's not magic. Just maths that's too complex to understand trivially.
For inference, we have hand crank that rotates a lot of gears, with a final gear making one token (word) appear in a slot. For learning, we even know how to feed a bunch of text into a complicated thing that tells us what gears to connect to each other and how. We have no idea why the gear ratios and placements are what they are.
We know how they work in that we built the framework, we don't know how they work in that we cannot decode what is "grown" on that framework during training.
If we completely knew how they worked we could go inside an explain exactly why every token generated was generated. Right now that is not possible to do, as the paths the tokens take through the layers tend to be outright nonsensical when observed.
We know how they're trained. We know the architecture in broad strokes (amounting to a few bits out of billions, albeit important bits). Some researchers try to understand the workings and have very very far to go.
Elon's late night tweets always make me think of that scene in The Office where the new CEO (James Spader) decides to close one of the branches without telling anyone and when asked about it he goes "I got into a case of Australian Reds... and... How should I say this, Colombian whites"
agree; notion of "code time" is almost equivalent to that other shitty metric the industry (used to?) use to evaluate dev productivity -- lines of code. It's usually meaningless, and promotes "garbage-in, garbage-out", leading to a decline in code quality. There's also the reality that engineers of differing seniority spend varying amounts of time writing code, and that the time spent thinking/reading typically adds more value (ex., less code-review churn, etc.).
Hello! Are you hiring MDs (or PhDs in relevant fields, like Pathology) for any roles, like a Medical Science Liaison/Medical Advisor? I expect that PathAI needs experts in the field of medicine to bridge the "path" with the "AI". If so, can you talk about the expectations for these roles and/or put me in touch with anyone currently serving such roles? Thanks!