Okay what about...what about uhhh... Static site builders that render the whole page out as an image map, making it visible for humans but useless for crawlers 🤔🤔🤔
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
AI is pretty good at OCR now. I think that would just make it worse for humans while making very little difference to the AI.
I'm ashamed to say that I switched my DNS nameservers to CF just for their anti crawler service.
Knowing Cloudflare, god know how much longer it'll be free for.
Did you enable the AI black hole/tarpit? It's the main reason I've used their stuff.
Question: those artificial stupidity bots want to steal the issues or want to steal the code? Because why they're wasting a lot of resources scraping millions of pages when they can steal everything via SSH (once a month, not 120 times a second)
I use Anubis on my personal website, not because I think anything I’ve written is important enough that companies would want to scrape it, but as a “fuck you” to those companies regardless
That the bots are learning to get around it is disheartening, Anubis was a pain to setup and get running
For mbin I managed to kill the attack of the scrapers only using cloudflare managed challenge for all except to fediverse post endpoints, from fediverse ua agents on certain get endpoints. Managed challenge on everything else.
So far, they've not gotten past it. But, a matter of time.
Increasingly, I’m reminded of this: Paul Bunyan vs. the spam bot (or how Paul Bunyan triggered the singularity to win a bet). It’s a medium-length read from the old internet, but fun.
I knew that was the worse option. Use the one that traps them in an infinite maze.