Does traning AI/ML-models on AI-generated content causes collapse on the quality of the output?

ryujin470@fedia.io · 12 hours ago

Does traning AI/ML-models on AI-generated content causes collapse on the quality of the output?

hendrik@palaver.p3x.de · edit-2 11 hours ago

Depends and no. The tools are completely ineffective.

There was a paper once about how feeding generative AI it’s own output makes it deteriorate. But that’s not the entire story. Many/most modern large language models are in fact trained or fine-tuned on synthetic text. Depending on how it’s done, it can very well make models better. For example in “distillation”, and AI companies can replace expensive RLHF with synthetic examples. It can also make them worse. But you’re not the one curating the datasets or deciding what goes where and how.

In general in ML it’s not advised to train a model on its own output. That in itself can’t make the predictions any better, just worse.