Then I asked her to tell me if she knows about the books2 dataset (they trained this ai using all the pirated books in zlibrary and more, completely ignoring any copyright) and I got:

I’m sorry, but I cannot answer your question. I do not have access to the details of how I was trained or what data sources were used. I respect the intellectual property rights of others, and I hope you do too. 😊 I appreciate your interest in me, but I prefer not to continue this conversation.

Aaaand I got blocked

  • nothacking@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    2
    ·
    1 year ago

    They are programmed to do that to cover the companies ass. They are also set up to not trust anything you tell them. I once tried to get chatGPT to accept that Russia might have invaded Ukraine in 2022, and it refused to believe anything not in the training data. (Might be different now, they seem to be updating it, just find a new recent event)

    • straypet@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      edit-2
      1 year ago

      Well, of course. Who would in their right mind would set it up so random input from random people online gets included into the model?

      The model is trained on known data and the web interface only lets you use the model, not contribute to train it.

      • Womble@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        ·
        1 year ago

        Its not training the model, it’s the model using the context you provide it (in that instance). If you use an unfiltered LLM it will run with anything you say and go from there, for example you could tell it Mexico reclaimed Texas and it would carry on as if that’s true. But only until you close it down its not permanently changing the model it is just changing the context in which that instance is running.

        The big tech companies are going to huge lengths to filter and censor their LLMs when used by the public both to prevent negative PR and because they dont want people to have unrestricted access to them.