Conversely, if just 10 percent of users in a given social media community largely agree with your stances, you will be more tolerant toward diverse opinions that contradict your own. “There’s a certain chance that some users will end up in communities where it’s very homogenous and 99 percent of users are disagreeing with them,” said Törnberg. “That will cause them to leave, and you get this feedback effect just because of the structure of interaction. But if you have a filter bubble effect, where everyone is shown 10 percent of their own type, that creates a possibility for you to find the people who you agree with within the community. And that stabilizes the entire dynamics so it doesn’t tip over to one side or the other and become extreme or overly homogenous.”
Ooh, this is interesting. It suggests the possibility of automating this; since most social media allows for upvoting and downvoting it should be possible to automatically determine which users are “agreeable” and which are “disagreeable” and filter thread contents to push it toward this 10 percent threshold.
Probably wouldn’t work on the Threadiverse yet, though, there’s not a large enough population here yet.




This benchmark is presenting AI with a challenge that’s greater than what human devs normally face. It’s supposed to be really hard, it’s not surprising that current models get 0%.
The point is that over time models will continue to improve and this benchmark will measure that improvement. A lot of current benchmarks have been saturated, once models are getting near 100% scores there’s no point to them any more.