this post was submitted on 22 Apr 2024
30 points (87.5% liked)
Technology
60073 readers
2807 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Definitely need to be careful then.
The best voice recognition is based on AI — yes.
Before AI, voice recognition existed but it was generally pretty shit and really struggled with accents, low quality microphones, background noise, people saying things that don't strictly make sense. E.g. if you say "We’ll burn that bridge when we get to it." a good AI might replace "burn" with the word "cross"... it will at least have the capability to do that, wether or not it will would depend on your settings - is "accuracy" about what someone said or what someone actually meant? That's configurable in the best systems.
Old software did. These days systems work so well that would just add cost with zero benefit. Good speech recognition will understand your speech perfectly as long as your microphone is decent and "learning" wouldn't help much with that one potential problem area.
Some speech systems do learn in order to recognise/identify people (for example, a voice assistant might use it to figure out who "me" is in a command like "remind me to do get milk when I get to the shops". And a good transcription service will recognise different people talking in a single recording, and provide an appropriately annotated transcript. That's about the extent of "recognising" your voice, it doesn't generally learn from you over time.
Kinda yeah. The researchers paid a huge number of people in third world countries to compare recordings to transcriptions, and make a "correct / incorrect" judgement call. Then fed all of that, and a whole bunch of other things (it's believed every YouTube video ever uploaded might have been involved...) into a very complex model.
Tweaks are made but it's just too much data (OpenAI says they used 680,000 hours of audio) to fully get your head around all of it. A bit like trying to understand how the human brain recognises speech — we have a broad idea but don't really know.
Check the privacy statement for the service. They might, for example, send your recordings to be assessed for accuracy by employees/subcontractors. AFAIK (not a lawyer) that would be a breach of HIPAA.
AFAIK some Apple speech recognition features are HIPAA compliant. Look that up to verify it but in general iPhones and Macs Apple have AI speech processing hardware on the device allowing fully local processing... but not all features are done locally and in some cases they may transmit "anonymised" (useless if you speak someone's name...) speech to employees/contractors to improve the software. That can be disabled in settings.
Amazon and OpenAI do everything in the cloud but have fully HIPAA compliant versions of their services (I assume those are not cheap...)
You could try open source models — I don't know how good they are in practice.