beep@piefed.world to Technology@beehaw.orgEnglish · 54 minutes agoAnthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."alignment.anthropic.comexternal-linkmessage-square2fedilinkarrow-up13arrow-down12file-text
arrow-up11arrow-down1external-linkAnthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."alignment.anthropic.combeep@piefed.world to Technology@beehaw.orgEnglish · 54 minutes agomessage-square2fedilinkfile-text
cross-posted from: https://piefed.world/c/tech/p/1099863/anthropic-claude-learned-to-blackmail-by-reading-stories-about-evil-ai-we-believe-the-or Quote Source.
minus-squarevalar@lemmy.calinkfedilinkarrow-up4·26 minutes agoMaybe you shouldn’t have trained your models on all of our internet posts