I just started using this myself, seems pretty great so far!

Clearly doesn’t stop all AI crawlers, but a significantly large chunk of them.

  • merthyr1831@lemmy.ml
    link
    fedilink
    English
    arrow-up
    67
    arrow-down
    1
    ·
    edit-2
    7 months ago

    It’s a clever solution but I did see one recently that IMO was more elegant for noscript users. I can’t remember the name but it would create a dummy link that human users won’t touch, but webcrawlers will naturally navigate into, but then generates an infinitely deep tree of super basic HTML to force bots into endlessly trawling a cheap-to-serve portion of your webserver instead of something heavier. Might have even integrated with fail2ban to pick out obvious bots and keep them off your network for good.

    • zutto@lemmy.fedi.zutto.fiOP
      link
      fedilink
      English
      arrow-up
      21
      ·
      7 months ago

      If you remember the project I would be interested to see it!

      But I’ve seen some AI poisoning sink holes before too, a novel concept as well. I have not heard of real world experiences of them yet.

    • paperd@lemmy.zip
      link
      fedilink
      English
      arrow-up
      15
      ·
      7 months ago

      That’s a tarpit that you’re describing, like iocaine or nepthasis. Those are to feed the crawler junk data to try and make their eventual output bad.

      Anubis tries to not let the AI crawlers in at all.

      • Cethin@lemmy.zip
        link
        fedilink
        English
        arrow-up
        5
        ·
        7 months ago

        It could be infinitely wide too if they desired. It shouldn’t be that hard to do I wouldn’t think. I would suspect they limit the time a chain can use though to eventually escape out, though this still protects data because it obfuscates legitimate data that it wants. The goal isn’t to trap them forever. It’s to keep them from getting anything useful.

      • nickwitha_k (he/him)@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        5
        ·
        7 months ago

        That would be reasonable. The people running these things aren’t reasonable. They ignore every established mechanism to communicate a lack of consent to their activity because they don’t respect others’ agency and want everything.