LLMs become more covertly racist with human intervention

ylai@lemmy.ml · 2 years ago

LLMs become more covertly racist with human intervention

keepthepace@slrpnk.net · 2 years ago

The study they link though has that among their conclusions:

Finally, we show that existing methods for alleviating racial bias in language models such as human feedback training do not mitigate the dialect prejudice, but can exacerbate the discrepancy between covert and overt stereotypes, by teaching language models to superficially conceal the racism that they maintain on a deeper level.

It feels like they have the same problem as hallucinations: The model learns core knowledge during the bas training and is then thought to ignore/invent some more but does not acquire new knowledge.