> *To start with, we know what a person is, rudimentary things about how they be...

> To start with, we know what a person is, rudimentary things about how they behave, our senses, how they commonly work, and can do mental comparisons (reality checks).

How much is this a matter of fidelity? LLMs started with text, now text + vision + sound; it's still not the full package relative to what humans sport, but it captures a good chunk of information.

Now, I'm not claiming equivalence in the training process here, but let's remember that we all spend the first year or two of our lives just figuring out the intuitive basics of "what a person is, rudimentary things about how they behave, our senses, how they commonly work", and from there, we spend the next couple years learning more explicit and complex aspects of the same. We don't start with any of it hardcoded (and what little we have, it's been bestowed to us by millennia of a much slower gradient-descent process - evolution).

> LLM’s have one architecture that does one job which we try to get to do other things, like reasoning or ground truth.

FWIW, LLMs have one architecture in a similar sense brain has one architecture - brains specialize as they grow. We know that parts of a brain are happy to pick up the slack for differently specialized parts that became damaged or unavailable.

LLMs aren't uniform blobs, either. Now, their architecture is still limited - for one, unlike our brains, they don't learn on-line - they get pre-trained and remain fixed for inference. How much a model capable of on-line learning will differ structurally from current LLMs, or even the naive approach to bestow learning ability on LLMs (i.e. do a little evaluation and training after every conversation)? We don't know yet.

I'm definitely not arguing LLMs of today are structurally or functionally equivalent to humans. But I am arguing that learning from sum total of the Internet isn't meaningfully different from how humans learn, at least for anything that we'd consider part of living in a technological society. I.e. LLMs don't get to experience throwing rocks first-hand like we do, but neither of us get to experience special relativity.

> Even they aren’t all trained in a coherent way using real-world, observations in the senses.

Neither them nor us. I think if there's one insight people should've gotten from the past couple years is that "mostly coherent" data is fine (particularly if any given subset is internally coherent, even if there's little coherence between different subsets) - both humans and LLMs can find larger coherence if you give them enough such data.