A long form response to the concerns and comments and general principles many people had in the post about authors suing companies creating LLMs.

  • Gumby@beehaw.org
    link
    fedilink
    English
    arrow-up
    11
    ·
    1 year ago

    I think this has to do with intent. If I read a book to use it for the basis of a play, that would be illegal. If I read for enjoyment, that is legal. Since AI does not read for enjoyment, but only to use it for the basis of creating something else, that would be illegal.

    Is my logic flawed?

    • Umbrias@beehaw.org
      link
      fedilink
      English
      arrow-up
      12
      ·
      1 year ago

      This isn’t how it works at all. I can, and should, and do, read and consume all sorts of media with the intention of stealing from it for my own works. If you ask for writing advice, this is actually probably one of the first things you’ll hear: read how other people do it.

      So this does not work as an argument, “the intent of the reading” because if so humans could never generate any new media either.

    • whelmer@beehaw.org
      link
      fedilink
      English
      arrow-up
      6
      ·
      1 year ago

      Your logic is flawed in that derivative works are not a violation of copyright. Generally, copyright protects a text or piece of art from being reproduced. Specific characters and settings can be protected by copyright, concepts and themes cannot. People take inspiration from the work of others all the time. Lots of TV shows or whatever are heavily informed by previous works, and that’s totally fine.

      Copyright protects the reproduction of other peoples work, and the reuse of their specific characters. It doesn’t protect style, themes, concepts, etc. IE. the things that an AI is trying to derive. So like if you trained your LLM only on Tolkien such that it always told stories about Gandalf and the hobbits, then that would be a problem.

    • Mutoid@beehaw.org
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 year ago

      “Reading with intent?” that sounds ridiculous. The only thing of concern is the work produced.

      • oomphaloompha@beehaw.org
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago

        Open up! It’s the thought-police! We have reason to believe you are reading with intent to commit a criminal act! You are under arrest! Anything you say or think can and will be used against you in the court of law!

  • snowbell@beehaw.org
    link
    fedilink
    English
    arrow-up
    10
    ·
    1 year ago

    I think the whole thing about megacorps being the problem here is a bit short sighted, I don’t think it will be too much longer before anyone can spin up their own LLM. It doesn’t exactly take Google levels of resources. I’m as happy to shit on megacorps as the next person here but IP law as it is is BS.

    More likely than not any changes made will be to benefit large corporations at the expense of individuals and competition. I’m imagining a world where copyright law has made it so that only big corporations can afford to pay for LLM training data. As if individuals had to pay library book prices for a personal book to train their personal LLM. This desire to “cash in” may just play right into the megacorporation’s hand.

    • Nanokindled@beehaw.org
      link
      fedilink
      English
      arrow-up
      6
      ·
      1 year ago

      I agree that cashing in is at least important part of this. As I understand it, however, past a certain point creating and using LLMs is in fact extremely expensive. That’s why GPT4 limits user interactions, for example. I also think that the more restricted these tools are in general, the better for everyone. It’s absolutely possible to use them in positive ways, but as it stamps they are mostly just flooding the internet with garbage at killing low level content jobs.

      • PixelPioneer@beehaw.org
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        We’re already heading in a direction that mainly benefits those who are already in power. The real impact of these lawsuits appears to be favoring corporations and copyright holders, without sufficient thought to how they might limit individuals like us. People are already anxious about AI taking their jobs, right? But if we keep creating laws that continuously favor the same powerful few, it shouldn’t shock us when the average person can’t keep up. Just to give you an idea, instead of being able to use Large Language Models (LLMs) to make my work easier, I may be forced to completely abandon this tech due to this kind of shortsightedness. LLMs should be a tool available to ALL of us, not just those at the top.

  • dr_catman@beehaw.org
    link
    fedilink
    English
    arrow-up
    9
    ·
    edit-2
    1 year ago

    If the rumor is true that OpenAI is using libgen to obtain books, then this will be a very interesting fight.

    Authors profiteering from arcane copyright laws vs. a sleazy company that hypes up an LLM as if it were HAL from 2001. Who is worse? Who should lose?! I’m on the edge of my seat already!

    • lemillionsocks@beehaw.org
      link
      fedilink
      English
      arrow-up
      15
      ·
      1 year ago

      Authors profiteering from arcane copyright laws

      I get this argument from the film, movie, television, videogame industry, and other more modern ones out there. But outside a handful of actual big name authors the average writer isnt exactly raking it in.

      Also thanks to being a relic of the past we do still have libraries which offer books for free to read with a subscription and not only is this common, but its a celebrated thing among most authors and the reading community.