As someone in this space I can attest that AI teaching in most (UK) universities...

jksk61 · on April 14, 2023

lmao even in my university (the serbian uni), we have at least calculus+linear algebra before any nn course. also to "learn" what's a cnn you just need gradients not integrals (unless you use some kind of non-lipschitz function as activation?), plus the idea of what a convolution is... but even Mobius knew it back in the 800s.

anyway i think your statement that industry is light years away from unis is just misleading. i think the two are trying to answer different questions: 1. how can i achieve a "somewhat" decent chatbot that gets me rich albeit not even knowing what it does [industry in case you wondered] 2. try to understand, quantify and measure how well a model works, is it stable? does it converge if we have small datasets? and so on so forth.

just my two cents, to conclude i think a good analogy to the current climate is the 700-800s with electromagnetism: plenty of people discovered "empirical" laws but didn't understand really the phenomenon.

ndriscoll · on April 14, 2023

You might be able to understand what a convolutional network "is" without calculus, but you'll be woefully unequipped to ask even obvious questions like "what if we put Fourier transforms around the convolutional layers" (a cursory search suggests it provides the expected speedup but is for some reason not a standard thing to do?). As someone outside of the industry, I'd also imagine any effort to explain what NNs are actually "learning" (or I suppose dually, how to design network architectures) is going to have a lot of fruitful overlap with signal processing theory, which is heavy on calculus, linear algebra, probability, etc.

staunton · on April 14, 2023

> a cursory search suggests it provides the expected speedup

What do you mean? Processing a CNN layer takes an amount of time that does not depend on the input data, only the input/output sizes. Fourier transform is just a change of basis. Why should anything speed up?

ndriscoll · on April 14, 2023

Because convolution is an O(N^2) operation, but a Fourier transform (and its inverse) can be done in O(NlogN) and turns convolution into O(N) multiplication. So if you do FFT, multiply, Inverse FFT, you get convolution in O(NlogN). I would guess that you don't even need to do the inverse FFT and can just learn in frequency space instead, but maybe there's some reason why that doesn't work out.

jondea · on April 14, 2023

Computing convolutions using FFTs is efficient for large kernels (or filters). Most convolutions in popular ML models have small kernels, a regime where it is typically more efficient to reformulate the convolution as a matrix multiplication.

I think your complexity argument is correct for N=pixels=kernel size. But typically, pixels>>kernel size.

Disclosure: I work at Arm optimising open source ML frameworks. Opinions are my own.

staunton · on April 14, 2023

I see, the wording confused me. You don't really "put Fourier transforms around the convolutional layers" because you have to completely replace the convolutions.

This seems to be done in some cases. I guess it isn't done more widely because the "standard" convolution kernels are very small and the performance would actually be worse?

Ambix · on April 15, 2023

Could you point to some actual examples of using FFT for NNs in real projects / GitHub codebases? Would like to dig more into the thing.

dontupvoteme · on April 14, 2023

>just my two cents, to conclude i think a good analogy to the current climate is the 700-800s with electromagnetism: plenty of people discovered "empirical" laws but didn't understand really the phenomenon.

Sounds dead on. Do these large """language""" models actually even implement any concepts from linguistics? Or is the entire "language" part of the model merely derived from the fact that it's inherently part of the training data?

I don't fault Chomsky at all for being fed up with the hype here.

The entire field is also glossing over the fact that other languages which aren't English exist.

TeMPOraL · on April 14, 2023

>> anyway i think your statement that industry is light years away from unis is just misleading. i think the two are trying to answer different questions: 1. how can i achieve a "somewhat" decent chatbot that gets me rich albeit not even knowing what it does [industry in case you wondered] 2. try to understand, quantify and measure how well a model works, is it stable? does it converge if we have small datasets? and so on so forth.

GP here is, IMO, confusing what the corporations want (1), with what corporate R&D people want (2). As long as the corps see good ROI on throwing infinite money at their AI R&D departments, then those corporate researchers are better positioned and better equipped to do actual, solid science, than academia ever can be. This has happened many times before, including in this industry. Research is best done by well-funded teams of smart people left to do whatever they fancy. When those conditions arise, progress happens, and it doesn't matter whether it's the government or industry that creates them.

(Conversely, the best hope for academia to become relevant again is that corporations lose interest in this research, and defund their departments. This could happen if e.g. transformers end up being a dead end, or compute suddenly becomes very expensive.)

> Do these large """language""" models actually even implement any concepts from linguistics? Or is the entire "language" part of the model merely derived from the fact that it's inherently part of the training data?

The latter. And guess what, they're not trying to solve the issue of linguistics. They started as tools to generate human-sounding text, but in the process of just throwing more data and compute at them, they not only got better, but started to acquire something resembling concept-level understanding.

It turns out that surprisingly many aspects of thinking seem to reduce well to proximity search in a vector space, if that space is high-dimensional enough. This result is both surprising and impactful well beyond the field of AI. It's arguably the first potential path we identified that the evolution could take to gradually random-walk itself from amoeabas to human brains.

jksk61 · on April 14, 2023

i'm not saying LLM are modeling linguistics in any way lol. i only meant that there's some kind of phenomena related to scaling+attention that produces good enough result for most "human language stuff", which is kind of unexpected (i mean everyine knows that if you build a large enough model you can teach it any function, but cmon it is architecture+scaling that made it possibile not scaling alone). Moreover, the architectures used, from attention layer, even LSTM for that matter are not completely understood, are being used because "they works" just as in the old days of electromagnetism the empiral laws "just worked" for their usage.

btw, in other languages i guess it is decent although it depends on which language, at least gpt4.

sarchertech · on April 14, 2023

> Calculus or even pre-calculus doesn't form part of the degree programme for most compsci BScs any more, because it is 'too hard'.

In the US I’ve never seen a BS in computer science that didn’t require calculus. I can’t speak for the UK, but it would surprise me that what you say is true.

johnaspden · on April 14, 2023

> To learn CNNs at the deep-dive level you need calculus, at least differentiation and integration.

Oh the poor dears, imagine needing schoolboy maths to do science.