> Accuracy is obviously going to improve in models Well, to be clear, they can p...

valine · on Feb 19, 2023

GPT3 is far more accurate than GPT2. Seems reasonable that larger models trained on more data will continue to improve accuracy. I'd also expect larger models to be better at summarizing text, ie potentially fixing the Bing issues where it hallucinates numbers.

Our models sizes are a product of our scaling and hardware limitations. There's no reason to believe we are anywhere near optimal.

flir · on Feb 19, 2023

> Seems reasonable that larger models trained on more data will continue to improve accuracy.

It also seems reasonable to assume that they will eventually encounter diminishing returns, and that the current issues, such as hallucinations, are inherent to the approach and may never be resolved.

To be clear I don't have a clue which statement is true (though I don't see why scaling would solve the hallucination problem).

PeterisP · on Feb 20, 2023

Scaling of models is a very researched area, and currently all the experiments show that scaling doesn't really get diminishing returns - that was checked in GPT-2 "era" with model sizes from very small up to GPT-2, and reconfirmed with GPT-3 and then with newer models. As far as we can see, scaling does not result in diminishing returns; and while it's certainly possible that we eventually encounter diminishing returns, it is not reasonable to presume that we actually will any time soon (as we have literally zero evidence for that and at least some evidence to the contrary), and even if we will, there's currently no reason to assume that the eventual breaking point is somewhere at "GPT-5" and not "GPT-15" or "GPT-55".

Jensson · on Feb 20, 2023

If significantly bigger models than now got better results we would have seen papers about that a long time ago so that the team/company can get more funding, lots of rich actors has worked on that for years.

If it doesn't produce better results however then they want their competitors to waste lots of money to make the same mistakes, there is really no benefit from publishing that and lots of drawbacks.

Otherwise it seems too much of a coincidence that Google and OpenAI ended up with models of basically the same size. Google could have trained a model 5x-10x larger easily, it isn't that expensive to them, but for some reason we didn't see that, and GPT-4 just never seems to launch.

valine · on Feb 20, 2023

It’s not just the cost of training the model, it’s the cost of doing inference at scale. ChatGPT boarder line too expensive to operate already. It’s hard to imagine a larger model that both economical and used by millions of people with our current hardware.

Jensson · on Feb 20, 2023

But if a larger model was good enough to replace a human, like for example a Google engineer, then it would still be worth it. So they have for sure tried to scale up, and if that extra scale gave results they would have published something about it.

Now, since the larger model wasn't good enough to replace a human engineer we can rest easy, it wont replace programmers anytime soon. If GPT-4 for example could replace engineers, OpenAI wouldn't need to monetise ChatGPT, they would just rent out artificial engineers to do coding for $10k a year.

mtsr · on Feb 19, 2023

Might turn out that for rules based systems such as prescriptive grammars (for grammatically correct language rather than natural spoken) there is still use for a system that explicitly represents those rules.

Then again, we are a big old bulb of wetware and we can generally learn to apply grammar rules correctly most of the time (when explicitly thinking about them, anyway).

Maybe what we need is some kind of meta cognition: being able to apply and evaluate rules that the current LLMs can already correctly reproduce.

int_19h · on Feb 19, 2023

The biggest problem is that scaling is non-linear. The returns might well be non-diminishing wrt model size, but if we have to throw N^2 hardware at it to make it (best-case) 2N better, we'll still hit the limit pretty quickly.

RC_ITR · on Feb 20, 2023

> GPT3 is far more accurate than GPT2

Please say more about what you mean here because I disagree.

It’s certainly more eloquent, but it still can’t multiple 2 4-digit numbers…

snapcaster · on Feb 20, 2023

But it did learn some basic arithmetic. If GPT4 can multiple 2 4-digit numbers will you change your mind?

RC_ITR · on Feb 21, 2023

GPT saw a bunch of arithmetic and can repeat it. That’s the joke behind the Reddit usernames and /r/counting.

The four digit number thing is just the current lower bound of where it gets confused because a lack of training data.

Once you teach a patient 8 year old the rules of multiplication once, they can multiple any two numbers (that they’ve never seen before) with an arbitrary number of digits. An LLM cannot and will not ever be able to do that because it is a specific tool and it is not designed to do that (doing that would be a bad outcome for an LLM since we have different tools that can do multiplication much more efficiently).

So yes, if a LLM learns rules based math (which it is not intended to do) I’ll eat not only my, but every hat in existence.

chrisdbanks · on Feb 20, 2023

They probably mean that if you test it on common NLP benchmarks it performs better.