More

suryabhupa · on June 27, 2024

In practice, and at scale, that's exactly what having <bos> and <eos> tokens allow you to easily and programmatically do.

suryabhupa · on June 27, 2024

The announcements are live on Twitter! See this for example: https://x.com/suryabhupa/status/1806342617191379167

suryabhupa · on June 27, 2024

Surya here from the core Gemma team -- we can think of a distillation loss as learning to model the entire distribution of tokens that are likely to follow the prefix thus far, instead of only the token in the training example. If you do some back of the envelope calculations, we can see that learning to model a larger distribution yields many more bits of information to learn from.

jakobov · on June 27, 2024

Gotcha. That makes sense. Thanks!

What are the theories as to why this works better than training on a larger quantity of non-simulated tokens?

Is it because the gradient from the non-simulated tokens is too noisy for a small model to model correctly?

suryabhupa · on Sept 28, 2021

This is really remarkable! How hard do you think it will be to support new models, i.e. does the tooling you’ve built generalize to you being able to serve other large scale models easily?

suryabhupa · on Dec 6, 2018

Hi everyone! One of the creators of DFL here. In an attempt to more deeply understand fundamental concepts in machine learning, we designed Depth First Learning. It's a pedagogy for diving deep into machine learning by carefully tailoring a curriculum around a particular paper or concept and leading small, focused discussion groups. So far, we’ve created guides for InfoGAN, TRPO, AlphaGoZero, and DeepStack.

Since our launch, we’ve received very positive feedback from students and researchers. Now, we want to run new, online classes around the world.

We intimately understand that the process of curating a meaningful curriculum with reading materials, practice problems, and instructive discussion points can be very rewarding, but also time-consuming and difficult. We wanted to make sure that the people compiling the content understood that their efforts were well worth their time and consequently decided to launch a fellowship program.

Thanks to the generosity of Jane Street, we will provide 4 fellows with a $4000 grant each to build a 6 week curriculum and run weekly on-line discussions.

If you’d like to lead a class about an important paper in machine learning, please visit http://fellowship.depthfirstlearning.com to apply. We look forward to hearing from you, and I'm happy to answer any questions about it!

suryabhupa · on July 25, 2017

Many machine learning and reinforcement learning models are susceptible to adversarial attacks; it's not unique to deep learning. However, because so many systems that are currently deployed in applications use deep learning, it's under particular scrutiny.

yters · on July 26, 2017

Then it seems the hype of machine learning is not well founded. Machine learning in general is a big risk if it so easily fooled.

suryabhupa · on April 22, 2017

There are talks of incorporating this into Excel at some point in the future, but it may take a _while_ before it can be fully productionized.

mldeeplearn · on April 23, 2017

looking forward to playing with it!

suryabhupa · on April 22, 2017

That would be pretty cool to see what it learns, but I don't think we've tried that :P

suryabhupa · on April 22, 2017

That's one manifestation of this kind of research being used in real life by programmers around the world. :)

suryabhupa · on April 21, 2017

One of the authors here -- would love to answer any questions about the work! :)

lostmsu · on April 22, 2017

Are you going to publish the source code for reproduction?

suryabhupa · on April 22, 2017

Eventually, yes.

harigov · on April 21, 2017

Are there any difficulties in generating a program in standard languages in Python? Did you choose a DSL because neural network is sensitive to the output programming language?

suryabhupa · on April 22, 2017

It turns out the full grammar of Python (and almost all real programming languages) is quite large; this is very early and new work in neural program synthesis, and so we chose a pretty limited DSL to make sure that we could at least solve this one before moving on to more general ones that contain state, conditionals, for-loops, etc. In theory however, we can apply the exact architecture to Python programs and see what happens. We haven't tried yet. :)

henning · on April 21, 2017

The first thing I thought of when reading this article was genetic programming.

Is this a significant improvement over evolutionary computation methods? Has that been attempted in the past?

suryabhupa · on April 22, 2017

I'm not too familiar with evolutionary computation methods, but I imagine the approaches may be similar in nature.

Gargoyle · on April 22, 2017

Why does the final example in figure 14 fail completely? The outputs are correct as far as they go, but they're all incomplete.

Is it because the scoring metric has a point where enough of a good start outscores an alternative in the beam search that could lead to a more complete solution? In non-trivial real-world examples, would the be a major problem?

krapht · on April 22, 2017

Any chance this could be implemented for automated proof synthesis? Would love to see this used in F* or Lean.

suryabhupa · on April 22, 2017

Theorem solving is very closely related to program induction (we just change the grammar). Just as with Python, the underlying search space would be incredibly large, and while in theory, we could simply change the DSL and it should work, it'll probably involve a few more iterations of the model or other insights to see this to fruition (but it's definitely not impossible).

kyberias · on April 22, 2017

What is the link with Excel FlashFill feature?

mldeeplearn · on April 23, 2017

It looks like this is a more comprehensive version of FlashFill (can do more tasks), and it is based on deep learning instead of previous rule-based techniques in FlashFill.