s2radhak's comments

s2radhak · 2025-05-13T17:35:56 1747157756

congrats on the launch! curious what solution you have for the data locality problem, especially when integrating with s3/gcs/blobstore

winwang · 2025-05-13T18:12:38 1747159958

Depends on the workload. Spark can persist dataframes to the cluster as normal. GPUs can also load certain datasets significantly faster.

s2radhak · 2025-02-10T02:00:42 1739152842

“ When we write, we share the way that we think. When we read, we get a glimpse of another mind. But when an LLM is the author, there is no mind there for a reader to glimpse.” — I dunno, I feel like reading is more a glimpse into how I think than how the author thinks…a generated story can be just as moving as one from a human, I think.

s2radhak · 2025-02-10T00:50:18 1739148618

Fascinating. The article repeatedly makes the claim that “LLMs work by predicting likely next words in a string of text”. Yet there’s the seemingly contradictory implication that we don’t know how LLMs work (ie we don’t know their secret sauce). How does one reconcile this? They’re either fancy autocompletes, or magic autocompletes (in which case the magic qualifier seems more important in understanding what they are than the autocomplete part).

Terr_ · 2025-02-10T20:25:11 1739219111

This occurs because of ambiguous language which conflates the LLM algorithm with the training-data and the derived weights.

The mysterious part involves whatever patterns might naturally exist within bazillions of human documents, and what partial/compressed patterns might exist within the weights the LLM generates (on training) and then later uses.

Analogy: We built a probe that travels to an alien planet, mines out crystal deposits, and projects light through those fragments to show unexpected pictures of the planet's past. We know exactly how our part of the machine works, and we know the chemical composition of the crystals, but...

ctbergstrom · 2025-02-12T06:12:15 1739340735

I very much like this analogy. Thank you for making this clearer in my mind.

mhast · 2025-02-11T13:45:23 1739281523

These system work by taking a list of tokens (basically words) and the "model" and send that to a function which returns one new token.

You add that new token to the list of tokens.

Repeat with the new list of tokens.

That's how these systems work. We don't know exactly how the model works (wrt input tokens) but even that is a simplification. It's not magic. Just maths that's too complex to understand trivially.

yencabulator · 2025-02-13T21:40:36 1739482836

For inference, we have hand crank that rotates a lot of gears, with a final gear making one token (word) appear in a slot. For learning, we even know how to feed a bunch of text into a complicated thing that tells us what gears to connect to each other and how. We have no idea why the gear ratios and placements are what they are.

habinero · 2025-02-10T01:15:20 1739150120

We...do know how they work?

Workaccount2 · 2025-02-10T15:01:01 1739199661

We know how they work in that we built the framework, we don't know how they work in that we cannot decode what is "grown" on that framework during training.

If we completely knew how they worked we could go inside an explain exactly why every token generated was generated. Right now that is not possible to do, as the paths the tokens take through the layers tend to be outright nonsensical when observed.

abecedarius · 2025-02-10T15:29:34 1739201374

We know how they're trained. We know the architecture in broad strokes (amounting to a few bits out of billions, albeit important bits). Some researchers try to understand the workings and have very very far to go.

s2radhak · on Nov 30, 2022

No, it's the dilapidated mall full of geezers getting their daily walks that the Zuck creepily stares at for ad revenue

s2radhak · on Nov 30, 2022

for our next experiment, we should re-introduce lead into paint

fortuna86 · on Nov 30, 2022

looking into this

ss108 · on Nov 30, 2022

The science on it is unsetttled and ideologically biased because I don't like the policies people justify on its basis.

s2radhak · on Nov 30, 2022

that blurb you quoted is just twitter rationalizing for Elon's red-bull fueled tweets to "improve twitter"

superfrank · on Nov 30, 2022

Lol, red bull...

Elon's late night tweets always make me think of that scene in The Office where the new CEO (James Spader) decides to close one of the branches without telling anyone and when asked about it he goes "I got into a case of Australian Reds... and... How should I say this, Colombian whites"

s2radhak · on Aug 22, 2022

agree; notion of "code time" is almost equivalent to that other shitty metric the industry (used to?) use to evaluate dev productivity -- lines of code. It's usually meaningless, and promotes "garbage-in, garbage-out", leading to a decline in code quality. There's also the reality that engineers of differing seniority spend varying amounts of time writing code, and that the time spent thinking/reading typically adds more value (ex., less code-review churn, etc.).

s2radhak · on April 1, 2022

Hello! Are you hiring MDs (or PhDs in relevant fields, like Pathology) for any roles, like a Medical Science Liaison/Medical Advisor? I expect that PathAI needs experts in the field of medicine to bridge the "path" with the "AI". If so, can you talk about the expectations for these roles and/or put me in touch with anyone currently serving such roles? Thanks!