This reminds me of a question I have about SD: why can’t it do a simple OCR to know those are characters not random shapes? It’s baffling that neither SD nor DE2 have any understanding of the content they produce.
You could certainly apply a “duct tape” solution like that, but the issue is that neural networks were developed to replace what were previously entire solutions built on a “duct tape” collection of rule-based approaches (see the early attempts at image recognition). So it would be nice to solve the problem in a more general way.
> why can’t it do a simple OCR to know those are characters not random shapes?
It's pretty easy to add this if you wanted to.
But a better method would be to fine tune on a bunch of machine-generated images of words if you want your model to be good at generating characters. You'll need to consider which of the many Unicode character sets you want your model to specialize in though.