Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is sort of my party trick I do in a number of places in diagnosing NNs from scare environmental data, let me give it a whirl. It sounds like your network has far too many degrees of freedom. Reducing it via residuals or some other method will likely help with that.

Additionally, some L2 weight decay, switching to SGD+OneCycle, don't forget to BatchNorm before every activation as well.

If this is a newer-style attention-Unet ala StyleGAN then that would be a confusing result as transformers seem to be pretty okay with not immediately collapsing to that kind of thing if I understand correctly.

Barring all of that, swapped labels can be a surprising reason for complex data to overfit as it forces the network into a memorization-only mode with very little chance for generalization.

Let me know if I got it correct/close for you. :) :D <3 :))))



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: