This is ... not what I expected. It's basically wiring up pre-trained models to ...

woodson · on April 26, 2023

Take a look at AudioLDM (https://github.com/haoheliu/AudioLDM), it might be more what you expected:

- Text-to-Audio Generation: Generate audio given text input.

- Audio-to-Audio Generation: Given an audio, generate another audio that contain the same type of sound.

- Text-guided Audio-to-Audio Style Transfer: Transfer the sound of an audio into another one using the text description.

hackernewds · on April 26, 2023

so then the training data is text, not audio?

khimaros · on April 26, 2023