this post was submitted on 09 Jan 2025
491 points (99.2% liked)

Opensource

4177 readers
123 users here now

A community for discussion about open source software! Ask questions, share knowledge, share news, or post interesting stuff related to it!

CreditsIcon base by Lorc under CC BY 3.0 with modifications to add a gradient



founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] moosetwin@lemmy.dbzer0.com 13 points 9 months ago (2 children)

I don't mind the idea, but I would be curious where the training data comes from. You can't just train them off of the user's (unsubtitled) videos, because you need subtitles to know if the output is right or wrong. I checked their twitter post, but it didn't seem to help.

[–] Warl0k3@lemmy.world 8 points 9 months ago

I hope they're using Open Subtitles, or one of the many academic Speech To Text datasets that exist.