this post was submitted on 13 May 2025
311 points (94.8% liked)
Technology
70001 readers
4710 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
AI voice synth is pretty solidly-useful in comparison to, say, video generation from scratch. I think that there are good uses for voice synth
e.g. filling in for an aging actor/actress who can't do a voice any more, video game mods, procedurally-generated speech, etc
but audiobooks don't really play to those strengths. I'm a little skeptical that in 2025, it's at the point where it's a good drop-in replacement for audiobooks. What I've heard still doesn't have emphasis on par with a human.
I don't know what it costs to have a human read an audiobook, but I can't imagine that it's that expensive; I doubt that there's all that much editing involved.
kagis
https://www.reddit.com/r/litrpg/comments/1426xav/whats_the_average_narrator_cost/
That's actually lower than I expected. Like, if a book sells at any kind of volume, it can't be that hard to make that back.
EDIT: I can believe that it's possible to build a speech synth system that does do better, mind
I certainly don't think that there are any fundamental limitations on this. It'd guess that there's also room for human-assisted stuff, where you have some system that annotates the text with emphasis markers, and the annotated text gets fed into a speech synth engine trained to convert annotated text to voice. There, someone listens to the output and just tweaks the annotated text where the annotation system doesn't get it quite right. But I don't think that we're really there today yet.
The annotated text idea could work but I'm just sceptical of whether or not you would end up doing more work annotating all of the text, listening to it back, redoing certain bits and then editing the final result into a single file then you would if you just had a human do it.
After all you've really automated is the reading of the text, which in the grand scheme of things doesn't take all that long.