Two authors sued OpenAI, accusing the company of violating copyright law. They say OpenAI used their work to train ChatGPT without their consent.

  • FunctionFn@midwest.social
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    By nature of a human creating something “connected” to another work, then the work is transformative. Copyright law places some value on human creativity modifying a work in a way that transforms it into something new.

    Depending on your point of view, it’s possible to argue that machine learning lacks the capacity for transformative work. It is all derivative of its source material, and therefore is infringing on that source material’s copyright. This is especially true when learning models like ChatGPT reproduce their training material whole-cloth like is mentioned elsewhere in the thread.

    • jecxjo@midwest.social
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      1 year ago

      I’d argue that all human work is derivative as well. Not from the legal stance of copyright law but from a fundamental stance of how our brains work. The only difference is that humans have source material outside that which is created. You have seen an apple on a tree before, not all of your apple experiences are pictures someone drew, photos someone took or a poem someone wrote. At what point would you consider enough personal experience to qualify as being able to generate transformative work? If I were to put a camera in my head and record my life and donate it as public domain would that be enough data to allow an AI to be considered able to create transformative works? Or must the AI have genuine personal experiences?

      Our brains can do some level of randomness but it’s current state is based on its previous state and the inputs it received. I wonder when trying to come up with something unique, what portion of our brains dive into memories versus pure noise generation. That’s easily done on a computer.

      As for whole cloth reproduction…I memorized many poems in school. Does that mean I can never generate something unique?

      Don’t get me wrong, they used stolen material, that’s wrong. But had it been legally obtained I see less of an issue.

      • FunctionFn@midwest.social
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        But derivative and transformative are legal terms with legal meanings. Arguing how you feel the word derivative applies to our brain chemistry is entirely irrelevant.

        You’ve memorized poems, and (assuming the poem is not in the public domain) if you reproduce that poem housed in a collection of poems without any license from the copyright owner you’ve infringed on that copyright. It is not any different when ChatGPT reproduces a poem in it’s output.

        • jecxjo@midwest.social
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          1 year ago

          I think it’s very relevant because those laws were created at a time when there was no machine generated material. The law makes the assumption that one human being is creating material and another human being is stealing some material. In no part of these laws do they dictate rules on creating a non-human third party that would do the actual copying. There were specific rules added for things like photocopy machines and faxes where attempts are made to create exact facsimiles. But ChatGPT isn’t doing what a photocopier does.

          The current lawsuits, at least the one’s I’ve read over, have not been explicitly about outputting copyright material. While ChatGPT could output the material just as i could recite a poem, the issues being brought up is that the training materials were copyright and that the AI system then “contains” said material. That is why i asked my initial question. My brain could contain your poem and as long as i dont write it down as my own, what violation is occuring? OpenAI could go to the library, rent every book and scan them in and all would be ok, right? At least from the recent lawsuits.

            • jecxjo@midwest.social
              link
              fedilink
              English
              arrow-up
              0
              ·
              1 year ago

              I meant in the opposite direction. If I teach an elephant to paint and then show him a Picasso and he paints something like it am I the one violating copyright law? I think currently there is no explicit laws about this type of situation but if there was a case to be made MY intent would be the major factor.

              The 3rd party copying we see laws around are human driven intent to make exact replicas. Photocopy machines, Cassette/VHS/DVD duplication software/hardware, Faxes, etc. We have personal private fair use laws but all of this about humans using tools to make near exact replicas.

              The law needs to catch up to the concept of a human creating something that then goes out and makes non replica output triggered by someone other than the tool’s creator. I see at least 3 parties in this whole process:

              • AI developer creating the system
              • AI teacher feeding it learning data
              • AI consumer creating the prompt

              If the data fed to the AI was all gathered by legal means, lets say scanned library books, who is in violation if the content output were to violate copyright laws?

              • FunctionFn@midwest.social
                link
                fedilink
                English
                arrow-up
                1
                ·
                1 year ago

                These are questions that, again, are tread pretty well in the copyright space. ChatGPT in this case acts more like a platform than a tool, because it hosts and can reproduce material that it is given. Again, US only perspective, and perspective of a non-lawyer, the DMCA outlines requirements for platforms to be protected from being sued for hosting and reproducing copyrighted works. But part of the problem is that the owners of the platforms are the parties that are uploading, via training the MLL, copyrighted works. That automatically disqualifies a platform from any sort of safe harbor protections, and so the owners of the ChatGPT platform would be in violation.