You can enable Private Instance
in your admin settings, this will mean only logged in users can see content. This will prevent AI scrapers from slowing down your instance as all they'll see is an empty homepage, so no DB calls. As long as you're on 0.19.11, federation will still work.
Fuck AI
"We did it, Patrick! We made a technological breakthrough!"
A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.
Enabled, thanks for the tip!
Same for Mbin.
So I just had a look at your robots.txt:
User-Agent: *
Disallow: /login
Disallow: /login_reset
Disallow: /settings
Disallow: /create_community
Disallow: /create_post
Disallow: /create_private_message
Disallow: /inbox
Disallow: /setup
Disallow: /admin
Disallow: /password_change
Disallow: /search/
Disallow: /modlog
Crawl-delay: 60
You explicitly allow searching your content by bots... That's likely one of the reasons why you get bot traffic.
AI crawlers ignore robots.txt. The only way to get them to stop is with active counter measures.
Patience, AI crash bubble burst will be soon.
It won't crash soon, sorry Charlie. Maybe in like 2 - 5 years, but honestly I don't think there will ever be a "crash", just less ai buzzwords in everything
🤞
At some point they're going to try to evade detection to continue scraping the web. The cat and mouse game continues except now the "pirates" are big tech.
They already do. ("They" meaning AI generally, I don't know about Claude or ChatGPT's bots specifically). There are a number of tools server admins can use to help deal with this.
See also:
these solutions have the side effect of making the bots stay on your site longer and generate more traffic. it's not for everyone.
Use Anubis. That's pretty much the only thing you can do against bots that they have no way of circumventing.
Yeah, going to install it this week, but the nginx extension seemed to solve the issue.
Which extention are you using if I may ask?
Anubis + Nepenthes is the answer.
You can either use Cloudflare(proprietary) or anubis (Foss)
Don't do this
Why?
Because it harms marginalized folks' ability to access content while also letting evil corp (and their fascist government) view (and modify) all encrypted communication with your site and its users.
It's bad.
For clarity, you are referring to Cloudflare and not anaubis?
I am referring to cf, but I would expect anaubis would be the same if it provides DoS fronting
Anubis work in a very different way than cloudflare
How well does it work in tor browser in strict mode?
Cloudflare has pretty good protection against this, but I totally understand not wanting to use Cloudflare
Haha, just wait when you get ddosed by anonymous user agents. I have been there.
I'm talking 40k requests per 5 seconds.
Just cache. Read only traffic should add negligible load to your server. Or you're doing something horribly wrong
They are 1 cpu and 1 gb of ram pods, postgres goes to 100% cpu on 500 requests per minute, after i put the NGINX extension, it reduced to at max 10%. On weaker servers, these bots make hell on earth, not the config.
If it's hitting postgres it's not hitting the cache. Do you have a caching reverse proxy in front of your web application?
I don't have a cache, but the problem is solved now, i can browse lemmy haha.
The nginx instance you have in front of your app can perform caching and avoid hitting your app. The advantage is that it will improve performance even against the most stealthy of bots, including those that don't even exist yet. The disadvantage is that the AI scum get what they want.
Oh, cool. I'm going to look at it!
If that doesn't work for you, also look at varnish and squid.
Load should be near zero for reads.