If so are these programs that claim to ‘poison’ the training datasets effective ?

  • FaceDeer@fedia.io
    link
    fedilink
    arrow-up
    5
    ·
    5 hours ago

    Semantic quibbling is one of the least interesting kinds of internet debate, so replace the word “understanding” with whatever word makes you happy. I continued with “and talking about” right afterwards so you can just delete the word entirely and the sentence still works fine. You could have just kept reading.

    Since you didn’t read the rest of my comment, I should note that the rest of it after that sentence is about the other issue that OP raised and not even about model collapse at all.

    Anyway. The article about model collapse that I see still crop up every once in a while is this one. It’s not that it has “methodological errors”, though, it’s just that it uses a very artificial training protocol to illustrate model collapse that doesn’t align with how LLMs are actually trained in real life. It’s like demonstrating the effects of inbreeding in animals by crossing brothers and sisters for twenty generations straight - you’ll almost certainly see some strong evidence, but it’s not a pattern of breeding that you are actually going to see in the wild.

    • Helix 🧬@feddit.org
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      35 minutes ago

      Semantic quibbling is one of the least interesting kinds of internet debate

      Cueball is typing on a computer.
Voice outside frame: Are you coming to bed?
Cueball: I can't. This is important.
Voice: What?
Cueball: Someone is WRONG on the Internet.

      Why do you engage in it then?

      In my opinion, a debate about the semantics of understanding and intelligence in context of AI is highly interesting, and a huge issue for worldwide politics and policies, but you do you.

    • Brummbaer@pawb.social
      link
      fedilink
      arrow-up
      1
      ·
      2 hours ago

      If I understand it right you need to enrich and filter data with human input so as not to collapse the model.

      Wouldn’t that imply if the human enrichment is emulating AI data too closely it will still collapse the model, since it’s now just the human filtering that’s mimicing AI data?