After dabbling in the world of LLM poisoning, I realised that I simply do not have the skill set (or brain power) to effectively poison LLM web scrapers.

I am trying to work with what I know /understand. I have fail2ban installed in my static webserver. Is it possible now to get a massive list of known IP addresses that scrape websites and add that to the ban list?

you are viewing a single comment's thread
view the rest of the comments

[–] lungdart@lemmy.ca 8 points 1 week ago

Fail2ban is not a static security policy.

It's a dynamic firewall. It ties logs to time boxed firewall rules.

You could auto ban any source that hits robots.txt on a Web server for 1h for instance. I've heard AI data scrapers actually use that to target big data rather than respect web server requests.