“Our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease,” they added. “We term this condition Model Autophagy Disorder (MAD).”

Interestingly, this might be a more challenging problem as we increase the use of generative AI models online.

  • h3ndrik@feddit.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Thank you for explaining. Yes. Now that i have skimmed through the paper i’m kind of disappointed in their work. It’s not a surprise to me that quality will degrade if you design a feedback loop with low quality data. And does this even mean anything for a distinction between human and synthetic data? Isn’t it obvious a model will deteriorate if you feed it progressively lower quality input, regardless of where you got that from? I’m pretty sure this is the mechanism behind that. A better question to ask would be: Is there some point where synthetic output gets good enough to train something with it. And how far away is that point. Or can we rule that out because of some properties we can’t get around. I’m not sure if learning from own output is even possible like this. I as a human certainly can’t teach myself. I would need some input like books or curated assignments/examples prepared by other people. There are kind of intrinsic barriers when teaching oneself. However I can certainly practice stuff. But that’s kind of a different mechanism. And difficult to compare to the AI stuff.

    I’m glad i can continue to play with the language models, have them tuned to follow instructions (with the help of GPT4 data) etc