this post was submitted on 18 Aug 2025
1139 points (99.0% liked)

Technology

74519 readers
3860 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
(page 2) 50 comments
sorted by: hot top controversial new old
[–] mfed1122@discuss.tchncs.de 16 points 1 week ago* (last edited 1 week ago) (7 children)

Okay what about...what about uhhh... Static site builders that render the whole page out as an image map, making it visible for humans but useless for crawlers 🤔🤔🤔

[–] echodot@feddit.uk 7 points 1 week ago (6 children)

AI is pretty good at OCR now. I think that would just make it worse for humans while making very little difference to the AI.

load more comments (6 replies)
load more comments (6 replies)
[–] sailorzoop@lemmy.librebun.com 15 points 1 week ago (1 children)

I'm ashamed to say that I switched my DNS nameservers to CF just for their anti crawler service.
Knowing Cloudflare, god know how much longer it'll be free for.

[–] AmbiguousProps@lemmy.today 8 points 1 week ago (1 children)

Did you enable the AI black hole/tarpit? It's the main reason I've used their stuff.

load more comments (1 replies)
[–] Wispy2891@lemmy.world 14 points 1 week ago (2 children)

Question: those artificial stupidity bots want to steal the issues or want to steal the code? Because why they're wasting a lot of resources scraping millions of pages when they can steal everything via SSH (once a month, not 120 times a second)

load more comments (2 replies)
[–] bizza@lemmy.zip 14 points 1 week ago

I use Anubis on my personal website, not because I think anything I’ve written is important enough that companies would want to scrape it, but as a “fuck you” to those companies regardless

That the bots are learning to get around it is disheartening, Anubis was a pain to setup and get running

[–] r00ty@kbin.life 13 points 1 week ago (4 children)

For mbin I managed to kill the attack of the scrapers only using cloudflare managed challenge for all except to fediverse post endpoints, from fediverse ua agents on certain get endpoints. Managed challenge on everything else.

So far, they've not gotten past it. But, a matter of time.

load more comments (4 replies)
[–] Monument@lemmy.sdf.org 10 points 1 week ago

Increasingly, I’m reminded of this: Paul Bunyan vs. the spam bot (or how Paul Bunyan triggered the singularity to win a bet). It’s a medium-length read from the old internet, but fun.

[–] Goretantath@lemmy.world 9 points 1 week ago (4 children)

I knew that was the worse option. Use the one that traps them in an infinite maze.

load more comments (4 replies)
load more comments
view more: ‹ prev next ›