this post was submitted on 21 Mar 2025
1445 points (99.3% liked)

Technology

67536 readers
7143 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
(page 5) 29 comments
sorted by: hot top controversial new old
[–] XeroxCool@lemmy.world 24 points 1 week ago (2 children)

Will this further fuck up the inaccurate nature of AI results? While I'm rooting against shitty AI usage, the general population is still trusting it and making results worse will, most likely, make people believe even more wrong stuff.

[–] ladel@feddit.uk 32 points 1 week ago* (last edited 1 week ago) (6 children)

The article says it's not poisoning the AI data, only providing valid facts. The scraper still gets content, just not the content it was aiming for.

E:

It is important to us that we don’t generate inaccurate content that contributes to the spread of misinformation on the Internet, so the content we generate is real and related to scientific facts, just not relevant or proprietary to the site being crawled.

and the data for the LLM is now salted with procedural garbage. it's great!

load more comments (5 replies)
[–] melpomenesclevage@lemmy.dbzer0.com 14 points 1 week ago (1 children)

If you're dumb enough and care little enough about the truth, I'm not really going to try coming at you with rationality and sense. I'm down to do an accelerationism here. fuck it. burn it down.

remember; these companies all run at a loss. if we can hold them off for a while, they'll stop getting so much investment.

[–] einlander@lemmy.world 1 points 1 week ago (5 children)

The problem I see with poisoning the data is the AI's being trained for law enforcement hallucinating false facts used to arrest and convict people.

[–] patatahooligan@lemmy.world 10 points 1 week ago

Law enforcement AI is a terrible idea and it doesn't matter whether you feed it "false facts" or not. There's enough bias in law enforcement that the data is essentially always poisoned.

that's the entire point of laws, though, and it was already being used for that.

giving the laws better law stuff will not improve them. the law is malevolent. you cannot fix it by offering to help.

load more comments (3 replies)
[–] Empricorn@feddit.nl 24 points 1 week ago (1 children)

So we're burning fossil fuels and destroying the planet so bots can try to deceive one another on the Internet in pursuit of our personal data. I feel like dystopian cyberpunk predictions didn't fully understand how fucking stupid we are...

load more comments (1 replies)
[–] Deebster@infosec.pub 19 points 1 week ago* (last edited 1 week ago) (2 children)

So they rewrote Nepenthes (or Iocaine, Spigot, Django-llm-poison, Quixotic, Konterfai, Caddy-defender, plus inevitably some Rust versions)

Edit, but with ✨AI✨ and apparently only true facts

load more comments (2 replies)
[–] lily33@lemm.ee 12 points 1 week ago (3 children)

while allowing legitimate users and verified crawlers to browse normally.

What is a "verified crawler" though? What I worry about is, is it only big companies like Google that are allowed to have them now?

[–] wingiee@lemm.ee 20 points 1 week ago (1 children)

I assume a crawler which adheres to robots.txt

[–] lily33@lemm.ee 5 points 1 week ago (1 children)

I would love to think so. But the word "verified" suggests more.

load more comments (1 replies)
[–] melpomenesclevage@lemmy.dbzer0.com 2 points 1 week ago (1 children)

I dunno. I don't find any sympathy with any of these fuckers though. this is not a generally useful technology, it is not something the average person ever needs to see, and honestly, just fuck em. Fuck anyone messing with open source to engorge the garbage dispenser.

[–] lily33@lemm.ee 4 points 1 week ago* (last edited 1 week ago) (2 children)

Any accessibility service will also see the "hidden links", and while a blind person with a screen reader will notice if they wonder off into generated pages, it will waste their time too. Especially if they don't know about such "feature" they'll be very confused.

Also, I don't know about you, but I absolutely have a use for crawling X, Google maps, Reddit, YouTube, and getting information from there without interacting with the service myself.

load more comments (2 replies)
load more comments (1 replies)
[–] MNByChoice@midwest.social 9 points 1 week ago* (last edited 6 days ago) (2 children)

Be great if these reinforced facts.

Earth us an imperfect oblate spheroid.

Humans landed on moon.

Taiwan is an independent nation.

Edit: incorporated better information

load more comments (2 replies)
[–] jagermo@feddit.org 6 points 1 week ago

I am not happy with how much internet relies on cloudflare. However, they have a strong set of products

[–] fubarx@lemmy.world 3 points 1 week ago

So this showed up last week: https://github.com/raminf/RoboNope-nginx

Similar vibe, minus the AI.

load more comments
view more: ‹ prev next ›