I am currently also playing around with this. The best part is that for storage you don't need to store the reconstructed image, just the latent representation and the VAE decoder (which can do the reconstructing later). So you can store the image as relatively few numbers in a database. In my experiment I was able to compress a (512, 384, 3) RGB image to (48, 64, 4) floats. In terms of memory it was a 8x reduction.
However, on some images the artefacts are terrible. It does not work as a general-purpose lossy compressor unless you don't care about details.
The main obstacle is compute. The model is quite large, but hdds are cheap. The real problem is that reconstruction requires a GPU with lots of VRAM. Even with a GPU it's 15 seconds to reconstruct an image in Google Collab. You could do it on CPU, but then it's extremely slow. This is only viable if compute costs go down a lot.
However, on some images the artefacts are terrible. It does not work as a general-purpose lossy compressor unless you don't care about details.
The main obstacle is compute. The model is quite large, but hdds are cheap. The real problem is that reconstruction requires a GPU with lots of VRAM. Even with a GPU it's 15 seconds to reconstruct an image in Google Collab. You could do it on CPU, but then it's extremely slow. This is only viable if compute costs go down a lot.