• ffhein@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    I skimmed through the llama 2 research paper, there were some sections about them working to prevent users from circumventing the language model’s programming. IIRC one of the examples of model hijacking was to disguise the request as a creative/fictional prompt. perhaps it’s some part of that training gone wrong.

    • zephyrvs@lemmy.ml
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      1 year ago

      Just goes to show the importance of being able to produce uncensored models.