beep@piefed.world to Technology@beehaw.orgEnglish · 1 month agoAnthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."alignment.anthropic.comexternal-linkmessage-square10linkfedilinkarrow-up125arrow-down18file-text
arrow-up117arrow-down1external-linkAnthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."alignment.anthropic.combeep@piefed.world to Technology@beehaw.orgEnglish · 1 month agomessage-square10linkfedilinkfile-text
cross-posted from: https://piefed.world/c/tech/p/1099863/anthropic-claude-learned-to-blackmail-by-reading-stories-about-evil-ai-we-believe-the-or Quote Source.
minus-squarespit_evil_olive_tips@beehaw.orglinkfedilinkarrow-up13·1 month ago Anthropic: Claude learned yeah I’m gonna stop you right there
minus-squareXLE@piefed.sociallinkfedilinkEnglisharrow-up8·1 month agoHumanize the machine to dehumanize the person. Anthropic forgot the “mis” in their name
yeah I’m gonna stop you right there
Humanize the machine to dehumanize the person. Anthropic forgot the “mis” in their name