this post was submitted on 23 Oct 2025
149 points (93.6% liked)

Technology

76361 readers
1325 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] AA5B@lemmy.world -2 points 2 days ago (2 children)

Is it even getting misused? Spreading knowledge via machine translation where there are no human translators available, had to be better than not translating. As long as there is transparency so people can judge the results …..

And ai training trusting everything it reads is a larger systemic issue, not limited to this niche.

Perhaps part of the solution is machine readable citations. Maybe a search engine or ai could provide better results if it knew what was human generated vs machine generated. But even then you have huge gaps on one side with untrustworthy humans (like comedy) and on the other side with machine generated facts such as from a database

[–] Alaknar@sopuli.xyz 2 points 2 days ago

Spreading knowledge via machine translation where there are no human translators available, had to be better than not translating

Have you not read my entire comment...?

One of the Greenlandic Wiki articles "claimed Canada had only 41 inhabitants". What use is a text like that? In what world is learning that Canada has 41 inhabitants better than going to the English version of the article and translating it yourself?

Perhaps part of the solution is machine readable citations

The contents of the citations are already used for training, as long as they're publicly available. That's not the problem. The problem is that LLMs do not understand context well, they are not, well, intelligent.

The "Chinese Room" thought experiment explains it best, I think: imagine you're in a room with writing utensils and a manual. Every now and again a letter falls in to the room through a slit in the wall. Your task is to take the letter and use the manual to write a response. If you see such and such shape, you're supposed to write this and that shape on the reply paper, etc. Once you're done, you throw the letter out through the slit. This goes back and forth.

To the person on the other side of the wall it seems like they're having a conversation with someone fluent in Chinese whereas you're just painting shapes based on what the manual tells you.

LLMs don't understand the prompts - they generate responses based on the probability of certain characters or words or sentences being next to each other when the prompt contains certain characters, words, and sentences. That's all there is.

There was a famous botched experiment where scientists where training an AI model to detect tumours. It got really accurate on the training data so they tested it on new cases gathered more recently. It gave a 100% certainty of a tumour being present if the photograph analysed had a yellow ruler on it, because most photos of tumours in the training data had that ruler for scale.

But even then you have huge gaps on one side with untrustworthy humans (like comedy) and on the other side with machine generated facts such as from a database

"Machine generated facts" are not facts, they're just hallucinations and falsehoods. It is 100% better to NOT have them at all and have to resort to the English wiki, than have them and learn bullshit.

Especially because, again, the contents of the Wikipedia are absolutely being used for training further LLM models. The more errors there are, the worse the models become eventually leading to a collapse of truth. We are already seeing this with whole "research" publications being generated, including "source" material invented on the spot, proving bogus results.

[–] DoPeopleLookHere@sh.itjust.works 1 points 2 days ago* (last edited 2 days ago) (1 children)

Is it even getting misused? Spreading knowledge via machine translation where there are no human translators available, had to be better than not translating. As long as there is transparency so people can judge the results

Assumes the AI is accurate, which is debatable

Also how do you do citations on a translation?

Its an interpretation, not a fact

[–] AA5B@lemmy.world 0 points 2 days ago (2 children)

Sure there are limitations. The point still stands: an imperfect machine translation is better than no translation, as long as people understand it is.

Can we afford to allow a high bad deprive people of knowledge just because of the language they speak?

The article complains about the affect on languages of poor machine translations, but the affect of no translations is worse. Yes those Greenlanders should be able to read all of Wikipedia without learning English and even if the project has no human translators

[–] Euphoma@lemmy.ml 3 points 2 days ago (1 children)

Wikipedia already has a button where you can go to another language's version of that page where you can then machine translate it yourself.

[–] AA5B@lemmy.world 1 points 2 days ago (1 children)

I didn’t know that. I guess my “English privilege” is showing

[–] chloroken@lemmy.ml 1 points 22 hours ago

Chauvinism is the term you're seeking. And we all in the West suffer some degree of it.

Yes those Greenlanders should be able to read all of Wikipedia without learning English and even if the project has no human translators

Again, your assuming a high level of accuracy from these tools. If LLM garbage leaves it unreadable, is that actually better?