Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"The distinction between likely and grammatical, which all humans have, is entirely foreign to ChatGPT and its fellow LLMs."

Because grammar is not determined by frequency.



Sure it is. Something is grammatical because you have frequently in the past heard sentences constructed that way. Something is ungrammatical because you never heard sentences constructed that way. It’s purely based on frequency.


This is wrong. "Colorless green ideas sleep furiously" is a famous example proposed as a clearly grammatical but meaningless, entirely new and unlikely sentence. There are many grammatically correct sentences you will hear in future that you've never heard before.


Sorry, but what is wrong then? The example is grammatically correct, because it fits established pattern, i.e. you may have never heard that exact same phrase, but you definitely heard multitude of phrases exhibiting the same pattern/rules


LLMs have been exposed to much larger datasets than any human ever has. If it was frequency then humans would make the grammatical errors and not the LLMs. The LLMs are making grammatical errors as shown in the article and therefore it is not about frequency.


Humans do grammatical errors (non-native, for whatever expressive effect, typos, dialects, slangs etc.) => datasets contain a percentage of grammatical errors => LLMs does it. I mean it can be about frequency, and carry infrequent errors because of it. I don't see any contradiction.


That’s entrance fits what I said just fine. The words are arranged in the order that I have frequently seen in the past, so I accept it as grammatical (albeit meaningless).


You've seen those words in that order in the past? Then pick an example you haven't seen (orange ideas etc)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: