The deep learning variety of neural networks are heavily simplified, mostly line...

KoolKat23 · on Oct 8, 2024

Would that be equivalent to Weight matrice parameters?

Onavo · on Oct 8, 2024

Ish, take a look at the curves of the spiking neural network function, they are very different from the deep learning nets. When we "model" biological neural nets in code, we are essentially coming up with a mathematical transfer function that can replicate the chemical gradient changes and electrical impulses of a real neuron. Imagine playing a 3D computer game like Minecraft, the physics is not perfect but they are "close enough" to the real world.

KoolKat23 · on Oct 9, 2024

Thanks this is very insightful.

Would the multi-head attention (Wv) not be equivalent to the chemical gradient changes?

(there are multiple matrices in multi-head attention, one for each attention head and what I imagine would be the equivalent of different gradients

This allows each attention head to learn different representations and focus on different aspects of the input sequence.)

And then the output produced after applying the concatenated (W0 or output projection), be equivalent to the different electrical outputs such as the spikes and passed to the next neuron equivalent or attention head?

Onavo · on Oct 9, 2024

¯\_(ツ)_/¯