Hacker Newsnew | past | comments | ask | show | jobs | submit | abdljasser2's commentslogin


My plan is to become a people person / ideas guy.


Excellent post


Thank you! I can also recommend this work which had a similar idea:

https://instrumentgen.netlify.app/


Good question. In my experience combining generic descriptors is what works best. This is probably due to the text captions used during training mostly consist of generic instrument names, genre names and adjectives.


https://erl-j.github.io/textsynth/

Sortof ML, Evolutionary algorithm w/ fitness determined by a ML model.


Hello! I think this will be possible!


I can't say for sure but I think the issue with why the reverb sounds off in that particular example is that the reverb present in the recording has a longer decay than the maximum reverb duration I set for the experiments (1s). I will set it longer in the future.

Yes, that kind of visualisation could be performed. The authors of the prior work we are building on (DDSP) have made some great visualisations which I think you will find useful! Here: https://storage.googleapis.com/ddsp/index.html


Thank you! Will do.


Very cool! Thank you for this.

Speaking of wavetables, have you seen this relatively recent work?

https://lamtharnhantrakul.github.io/diffwts.github.io/?


oh nice, i have not. thanks i've got some catching up to do! i see that they used the nsynth dataset, i didn't realize it was publicly available. i recall that the nsynth paper came out just as i was finishing that work, you can imagine i felt a bit scooped ;). (but nsynth was far more impressive so what could i say..)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: