Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>just my two cents, to conclude i think a good analogy to the current climate is the 700-800s with electromagnetism: plenty of people discovered "empirical" laws but didn't understand really the phenomenon.

Sounds dead on. Do these large """language""" models actually even implement any concepts from linguistics? Or is the entire "language" part of the model merely derived from the fact that it's inherently part of the training data?

I don't fault Chomsky at all for being fed up with the hype here.

The entire field is also glossing over the fact that other languages which aren't English exist.



>> anyway i think your statement that industry is light years away from unis is just misleading. i think the two are trying to answer different questions: 1. how can i achieve a "somewhat" decent chatbot that gets me rich albeit not even knowing what it does [industry in case you wondered] 2. try to understand, quantify and measure how well a model works, is it stable? does it converge if we have small datasets? and so on so forth.

GP here is, IMO, confusing what the corporations want (1), with what corporate R&D people want (2). As long as the corps see good ROI on throwing infinite money at their AI R&D departments, then those corporate researchers are better positioned and better equipped to do actual, solid science, than academia ever can be. This has happened many times before, including in this industry. Research is best done by well-funded teams of smart people left to do whatever they fancy. When those conditions arise, progress happens, and it doesn't matter whether it's the government or industry that creates them.

(Conversely, the best hope for academia to become relevant again is that corporations lose interest in this research, and defund their departments. This could happen if e.g. transformers end up being a dead end, or compute suddenly becomes very expensive.)

> Do these large """language""" models actually even implement any concepts from linguistics? Or is the entire "language" part of the model merely derived from the fact that it's inherently part of the training data?

The latter. And guess what, they're not trying to solve the issue of linguistics. They started as tools to generate human-sounding text, but in the process of just throwing more data and compute at them, they not only got better, but started to acquire something resembling concept-level understanding.

It turns out that surprisingly many aspects of thinking seem to reduce well to proximity search in a vector space, if that space is high-dimensional enough. This result is both surprising and impactful well beyond the field of AI. It's arguably the first potential path we identified that the evolution could take to gradually random-walk itself from amoeabas to human brains.


i'm not saying LLM are modeling linguistics in any way lol. i only meant that there's some kind of phenomena related to scaling+attention that produces good enough result for most "human language stuff", which is kind of unexpected (i mean everyine knows that if you build a large enough model you can teach it any function, but cmon it is architecture+scaling that made it possibile not scaling alone). Moreover, the architectures used, from attention layer, even LSTM for that matter are not completely understood, are being used because "they works" just as in the old days of electromagnetism the empiral laws "just worked" for their usage.

btw, in other languages i guess it is decent although it depends on which language, at least gpt4.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: