Hacker Newsnew | past | comments | ask | show | jobs | submit | suryabhupa's commentslogin

In practice, and at scale, that's exactly what having <bos> and <eos> tokens allow you to easily and programmatically do.


The announcements are live on Twitter! See this for example: https://x.com/suryabhupa/status/1806342617191379167


Surya here from the core Gemma team -- we can think of a distillation loss as learning to model the entire distribution of tokens that are likely to follow the prefix thus far, instead of only the token in the training example. If you do some back of the envelope calculations, we can see that learning to model a larger distribution yields many more bits of information to learn from.


Gotcha. That makes sense. Thanks!

What are the theories as to why this works better than training on a larger quantity of non-simulated tokens?

Is it because the gradient from the non-simulated tokens is too noisy for a small model to model correctly?


This is really remarkable! How hard do you think it will be to support new models, i.e. does the tooling you’ve built generalize to you being able to serve other large scale models easily?


Hi everyone! One of the creators of DFL here. In an attempt to more deeply understand fundamental concepts in machine learning, we designed Depth First Learning. It's a pedagogy for diving deep into machine learning by carefully tailoring a curriculum around a particular paper or concept and leading small, focused discussion groups. So far, we’ve created guides for InfoGAN, TRPO, AlphaGoZero, and DeepStack.

Since our launch, we’ve received very positive feedback from students and researchers. Now, we want to run new, online classes around the world.

We intimately understand that the process of curating a meaningful curriculum with reading materials, practice problems, and instructive discussion points can be very rewarding, but also time-consuming and difficult. We wanted to make sure that the people compiling the content understood that their efforts were well worth their time and consequently decided to launch a fellowship program.

Thanks to the generosity of Jane Street, we will provide 4 fellows with a $4000 grant each to build a 6 week curriculum and run weekly on-line discussions.

If you’d like to lead a class about an important paper in machine learning, please visit http://fellowship.depthfirstlearning.com to apply. We look forward to hearing from you, and I'm happy to answer any questions about it!


Many machine learning and reinforcement learning models are susceptible to adversarial attacks; it's not unique to deep learning. However, because so many systems that are currently deployed in applications use deep learning, it's under particular scrutiny.


Then it seems the hype of machine learning is not well founded. Machine learning in general is a big risk if it so easily fooled.


There are talks of incorporating this into Excel at some point in the future, but it may take a _while_ before it can be fully productionized.


looking forward to playing with it!


That would be pretty cool to see what it learns, but I don't think we've tried that :P


That's one manifestation of this kind of research being used in real life by programmers around the world. :)


One of the authors here -- would love to answer any questions about the work! :)


Are you going to publish the source code for reproduction?


Eventually, yes.


Are there any difficulties in generating a program in standard languages in Python? Did you choose a DSL because neural network is sensitive to the output programming language?


It turns out the full grammar of Python (and almost all real programming languages) is quite large; this is very early and new work in neural program synthesis, and so we chose a pretty limited DSL to make sure that we could at least solve this one before moving on to more general ones that contain state, conditionals, for-loops, etc. In theory however, we can apply the exact architecture to Python programs and see what happens. We haven't tried yet. :)


The first thing I thought of when reading this article was genetic programming.

Is this a significant improvement over evolutionary computation methods? Has that been attempted in the past?


I'm not too familiar with evolutionary computation methods, but I imagine the approaches may be similar in nature.


Why does the final example in figure 14 fail completely? The outputs are correct as far as they go, but they're all incomplete.

Is it because the scoring metric has a point where enough of a good start outscores an alternative in the beam search that could lead to a more complete solution? In non-trivial real-world examples, would the be a major problem?


Any chance this could be implemented for automated proof synthesis? Would love to see this used in F* or Lean.


Theorem solving is very closely related to program induction (we just change the grammar). Just as with Python, the underlying search space would be incredibly large, and while in theory, we could simply change the DSL and it should work, it'll probably involve a few more iterations of the model or other insights to see this to fruition (but it's definitely not impossible).


What is the link with Excel FlashFill feature?


It looks like this is a more comprehensive version of FlashFill (can do more tasks), and it is based on deep learning instead of previous rule-based techniques in FlashFill.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: