Hacker Newsnew | past | comments | ask | show | jobs | submit | MAXPOOL's commentslogin

If you take a birds eye view, fundamental breakthroughs don't happen that often. "Attention Is All You Need" paper also came out in 2017. It has now been 7 years without breakthrough at the same level as transformers. Breakthrough ideas can take decades before they are ready. There are many false starts and dead ends.

Money and popularity are orthogonal to pathfinding that leads to breakthroughs.


Well said


There are many others that are better.

1/ The Annotated Transformer Attention is All You Need http://nlp.seas.harvard.edu/annotated-transformer/

2/ Transformers from Scratch https://e2eml.school/transformers.html

3/ Andrej Karpathy has really good series of intros: https://karpathy.ai/zero-to-hero.html Let's build GPT: from scratch, in code, spelled out. https://www.youtube.com/watch?v=kCc8FmEb1nY GPT with Andrej Karpathy: Part 1 https://medium.com/@kdwa2404/gpt-with-andrej-karpathy-part-1...

4/ 3Blue1Brown: But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning https://www.youtube.com/watch?v=wjZofJX0v4M Attention in transformers, visually explained | Chapter 6, Deep Learning https://www.youtube.com/watch?v=eMlx5fFNoYc Full 3Blue1Brown Neural Networks playlist https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_6700...


In addition, these websites are totally free.

The website listed here:

> I consider requests for full commercial use of all content on this site (and the github repository). For a complete buyout of all content rights, the cost is €10,000,000. > I’d like to ask you what problems you have by that I keep on having the copyright of my document.

+ no commercial-use without paying 20% royalty.

So fairly expensive for a Keras tutorial.


I think that's pretty obviously a joke, no?


Slightly off topic: I'm interested in taking part in the Vesuvius challenge[0], but I don't have a background in ML, just a regular web developer. Does anyone have suggestions on how to get started? I planned to get some background on practical ML by working through Karpathy's Zero to Hero series along with the Understanding Deep Learning book. Would that be enough or anything else I should learn? I plan to understand the existing solutions to last year's prize and then pick a smaller sub challenge.

[0] https://scrollprize.org/


I made a list of all the free resources I used to study ML and deep learning to become an ML engineer at FAANG, so I think it'll be helpful to follow these resources: https://www.trybackprop.com/blog/top_ml_learning_resources (links in the blog post)

Fundamentals Linear Algebra – 3Blue1Brown's Essence of Linear Algebra series, binged all these videos on a one hour train ride visiting my parents

Multivariable Calculus – Khan Academy's Multivariable Calculus lessons were a great refresher of what I had learned in college. Looking back, I just needed to have reviewed Unit 1 – intro and Unit 2 – derivatives.

Calculus for ML – this amazing animated video explains calculus and backpropagation

Information Theory – easy-to-understand book on information theory called Information Theory: A Tutorial Introduction.

Statistics and Probability – the StatQuest YouTube channel

Machine Learning Stanford Intro to Machine Learning by Andrew Ng – Stanford's CS229, the intro to machine learning course, published their lectures on YouTube for free. I watched lectures 1, 2, 3, 4, 8, 9, 11, 12, and 13, and I skipped the rest since I was eager to move onto deep learning. The course also offers a free set of course notes, which are very well written.

Caltech Machine Learning – Caltech's machine learning lectures on YouTube, less mathematical and more intuition based

Deep Learning Andrej Karpathy's Zero to Hero Series – Andrej Karpathy, an AI researcher who graduated with a Stanford PhD and led Tesla AI for several years, released an amazing series of hands on lectures on YouTube. highly highly recommend

Neural networks – Stanford's CS231n course notes and lecture videos were my gateway drug, so to speak, into the world of deep learning.

Transformers and LLMs Transformers – watched these two lectures: lecture from the University of Waterloo and lecture from the University of Michigan. I have also heard good things about Jay Alammar's The Illustrated Transformer guide

ChatGPT Explainer – Wolfram's YouTube explainer video on ChatGPT

Interactive LLM Visualization – This LLM visualization that you can play with in your browser is hands down the best interactive experience with an LLM.

Financial Times' Transformer Explainer – The Financial Times released a lovely interactive article that explains the transformer very well.

Residual Learning – 2023 Future Science Prize Laureates Lecture on residual learning.

Efficient ML and GPUs How are Microchips Made? – This YouTube video by Branch Education is one of the best free educational videos on the internet, regardless of subject, but also, it's the best video on understanding microchips.

CUDA – My FAANG coworkers acquired their CUDA knowledge from this series of lectures.

TinyML and Efficient Deep Learning Computing – 2023 lectures on efficient ML techniques online.

Chip War – Chip War is a bestselling book published in 2022 about microchip technology whose beginning chapters on the invention of the microchip actually explain CPUs very well


Wow, thanks for the links to all the resources. Lot of interesting stuff for me to learn!


These slides from Lucas Beyer are pretty nice. https://docs.google.com/presentation/d/1ZXFIhYczos679r70Yu8v...


oh! 2/ recommendation is an absolute masterpiece of simplicity and effectiveness - cheers for that!


Without looking the answer, what is your intuition about the size of the VC-dimension of ReLU networks as a function of a number of weights and layers?

Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks https://arxiv.org/abs/1703.02930


That's is based on old assumption of neuron function.

Firstly, Kurzweil underestimates the number connections by order of magnitude.

Secondly, dentritic computation changes things. Individual dentrites and the dendritic tree as a whole can do multiple individual computations. logical operations low-pass filtering, coincidence detection, ... One neuronal activation is potentially thousands of operations per neuron.

Single human neuron can be equivalent of thousands of ANN's.


> deep learning architectures have been crafted to create inductive biases matching invariances and spatial dependencies of the data. Finding corresponding invariances is hard in tabular data, made of heterogeneous features, small sample sizes, extreme values

Transformers with positional encoding have embeddings are invariant to the input order. CNN's have translation invariance and can have little rotational invariance.

It's harder to find similar invariances to tabular data. Maybe applying methods from GNN's would help?


Effect of exercise for depression: systematic review and network meta-analysis of randomised controlled trials.

Conclusions Exercise is an effective treatment for depression, with walking or jogging, yoga, and strength training more effective than other exercises, particularly when intense. Yoga and strength training were well tolerated compared with other treatments. Exercise appeared equally effective for people with and without comorbidities and with different baseline levels of depression. To mitigate expectancy effects, future studies could aim to blind participants and staff. These forms of exercise could be considered alongside psychotherapy and antidepressants as core treatments for depression.


Mamba is a new model architecture based on SSM's.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752

https://github.com/state-spaces/mamba

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model https://paperswithcode.com/paper/vision-mamba-efficient-visu...


Jan 3, 2024 Lecture by Sergey Levine about progress on real-world deep RL. Covers these papers:

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Grow Your Limits: Continuous Improvement with Real-World RL for Robotic Locomotion

Reset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors without Human Intervention

REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation

FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing: https://sites.google.com/view/fastrlap

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions: https://qtransformer.github.io/

Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators: https://rl-at-scale.github.io/


wow thanks for submitting this and putting these links together. is there a better way to get up to speed on this than going through the papers in order and trying to replicate (in simulators) one at a time? Best way I can think of to try it is with Unity + Python, but there's a lot of rabbit hole risk there.


His first publication was "Potrzebie System of Weights and Measures" for Mad Magazine in June 1967 when he was 19-years old.

https://silezukuk.tumblr.com/image/616657913


The story there is that he had written this in high school, combining the style of MAD magazine with the textbook “system of weights and measures”, as one of his submissions to the Westinghouse Science Talent Search (1956). A few months later when he was in college he sent it to MAD magazine with (basically) “you guys may like this”, and to his surprise they treated it as a submission and decided to publish it (with illustrations by Wallace Wood); it came out in June 1957. Later when writing his CV Knuth decided to start counting from this as his first publication.


I don't know anything about that, but he was born in 1938, so he wasn't 19 years old.


MAD Magazine 1957, not 1967.


Two best:

CHESS IS A FUN SPORT, WHEN PLAYED WITH SHOT GUNS

COWS FLY LIKE CLOUDS BUT THEY ARE NEVER COMPLETELY SUCCESSFUL.

These are from MegaHal that entered 1998 Loebner Prize Contest. MegaHal was able to produce mind-blowing insightful sayings but most were just bs.

It seems that creativity is easy for computers. Just push randomness through some generative algorithm. Curating and selecting the best output makes all the difference. The ability to select, critique, and understand what is generated and what the meaning is is much harder.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: