this post was submitted on 01 Jul 2025
1073 points (98.4% liked)
memes
15982 readers
1878 users here now
Community rules
1. Be civil
No trolling, bigotry or other insulting / annoying behaviour
2. No politics
This is non-politics community. For political memes please go to [email protected]
3. No recent reposts
Check for reposts when posting a meme, you can only repost after 1 month
4. No bots
No bots without the express approval of the mods or the admins
5. No Spam/Ads
No advertisements or spam. This is an instance rule and the only way to live.
A collection of some classic Lemmy memes for your enjoyment
Sister communities
- [email protected] : Star Trek memes, chat and shitposts
- [email protected] : Lemmy Shitposts, anything and everything goes.
- [email protected] : Linux themed memes
- [email protected] : for those who love comic stories.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The model is trained on a massive corpus of existing data and then fine tuned to match the target voice actor. Using less than ~30s of reference audio you can get a pretty decent fine tuning the main issue is that it currently isn't on par with the quality and consistency of an in studio voice actor, especially over long time domains.
Hence my usage of the words "fully train". The other commentor wants to license every piece of audio used in training the model which obviously includes the base model...
You can feed an infinite amount of data into existing models and it won't improve the issues. The problem is with the models themselves.
And the audio used to train the base model are licensed. Usually under an MIT, creative commons, etc. license.