Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I notice that I am particularly confused by the example of "categorize" -- I am pretty sure you're not saying that ChatGPT can't categorize things, but then I have no good hypothesis for what you do mean.

That is exactly what I mean. GPT infers patterns. Some language patterns themselves are categorization, like words surrounding "or". It's important to recognize that humans are creating the categorization in these cases.

GPT does not differentiate at all between one pattern and another. It doesn't categorize the pattern that represents words, or the pattern that represents sentences, or the one that represents an answer in response to a question, etc. These are all symbolic categorizations that humans do to text; and we call that set of categories "language". GPT, ironically, has no concept of language or grammar. It only sees patterns as patterns.



Can you give a concrete example of a way that a human is capable of categorizing, where GPT is not? Some examples of concrete examples might be

- Humans can take a list of things like ["goldfish", "cat", "shark", "tree", "whale", "rock", "human"] and a list of categories like ["animal", "plant", "non-living"], and say which category each thing belongs to. GPT cannot do this.

- Humans can take a list of things like the above, say which categorization scheme is most appropriate for the thing they are trying to do, and then, according to the categorization scheme they said, say what category each of the things belongs in. GPT cannot do this.

- Humans can take a list of things like the above, and describe a novel categorization scheme for them. Humans can then say out which category things that were not in the original list belong to. GPT cannot do this.

- Some other thing where humans can describe how to categorize something but GPT cannot do so.

- Humans do something internally that GPT cannot do, but that thing has no observable effects from outside the system

Are any of these close to what you mean?


Portobello mushroom is certainly wrong, I also would have included color for salmon but that one is more up for debate I think. I think some humans would probably write the same categorizations though.

[[[Me]]]

> I will provide two lists of items inside brackets, separated by commas. I want you to categorize the entities in the first list into the categories in the second, keeping in mind that an entity in the first list may fit into multiple categories: [goldfish, shark, salmon, blue, tiger, cake, cyan, portobello mushroom, tree] [animal, fish, food, color, mammal, plant]

[[[GPT4]]]

> Here's the categorization of the entities in the first list into the categories in the second list:

    goldfish: [animal, fish, food]
    shark: [animal, fish, food]
    salmon: [animal, fish, food]
    blue: [color]
    tiger: [animal, mammal]
    cake: [food]
    cyan: [color]
    portobello mushroom: [food, plant]
    tree: [plant]


Yes, all of them. But we can go even simpler:

Humans can look at a list of words separated by commas, and call it a "list". We can also call it, "not a CSV file".

Humans can look at a 5, and call it five. We can hold up a hand, and say, "this many".

This behavior is named "semiotics". It's the act of using one thing to represent another.

LLMs are designed to intentionally avoid this approach. Instead of constructing an explicit grammar (like a parser), an LLM takes the opposite approach: inference.

Inference makes it possible to model the ambiguous patterns that natural language is made of. Inference also makes it impossible to define symbols.

Humans use both semiotics and inference. So far, it looks like no one has quite cracked how to do that artificially.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: