Training on its own data is a tradition already. For example RLHF example pairs rated by humans are generated by the model. So even our best models trained on their own outputs + rating from human labellers. The internet is a huge rating machine, AI will distill this signal and improve even while ingesting its own text.