LLMs can unmask pseudonymous users at scale with surprising accuracy

Powderhorn@beehaw.org · 3 months ago

LLMs can unmask pseudonymous users at scale with surprising accuracy

AmbitiousProcess (they/them)@piefed.social · 3 months ago

This is something we’re gonna see a lot more of, and I don’t mean specifically “LLMs doing privacy violations”, though that’ll probably be a lot of it.

LLMs are really good at taking unstructured data (e.g. all your social media posts, usernames, aliases, writing style, hints about your location, time of activity, etc) and turning it into structured data. (e.g. name=this, city=that, political preference=them, etc). Why do you think most early uses of LLMs that were quickly deployed were just article summarizer tools? Unstructured data (articles) > Structured data (bullet points)

This is really good for surveillance, because it means they can take all your activity and condense it down into something that’s easier to parse and correlate. Other tools have existed to do this for a long time, (mostly in the hands of intelligence agencies) but this just makes it more accessible and easy to use, and adds some complexity to how it can operate.

I think we’re gonna see a lot more use of LLMs for things like this. Taking something unstructured, and making it structured, because hallucinations and things like that are a lot less common when the task is just reorganizing existing information, rather than coming up with something new. (though of course, hallucinations will never go away, and are still gonna be pretty prevalent)

That could be deanonymizing your accounts, or it could just be things like looking through all your files to sort them into better predefined categories, or things like what Mozilla does with their tab groups where you can have it suggest other tabs that would fit into that group, and a local model figures out which tabs belong in which topic (with pretty good accuracy in my experience.)

Unfortunately, companies have very little interest in making your life easier by doing things like sorting your files for you, because they already are quite disinterested in making their systems easy to use if it doesn’t directly generate a profit (cough cough- Microslop), and have a much larger interest in doing things like tracking you to sell you some new crap.