Open Source

31733 readers

134 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Posts must be relevant to the open source ideology
No NSFW content
No hate speech, bigotry, etc

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 5 years ago

MODERATORS

[email protected]

Lokas : Record and transcribe your meetings in complete confidentiality ! (framablog.org)

submitted 3 weeks ago by [email protected] to c/[email protected]

8 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[+] [email protected] -14 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

This is about a speech transcription program running on the company's remote server. The app uploads your audio and the system sends you a transcript. You've got to be kidding. Reporting as spam.

[–] [email protected] 36 points 3 weeks ago (2 children)

It's free software which you can host yourself. The source is here (GPLv3). You can read more about the people that make it here: https://en.wikipedia.org/wiki/Framasoft

[–] [email protected] 3 points 3 weeks ago

Framasoft

They're also involved in Fediverse development, they made Mobilizon as an event management platform

https://framablog.org/2022/11/08/mobilizon-v3-find-events-and-groups-throughout-the-fediverse/

[–] [email protected] 8 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

It didn't say that on the linked page. Is the AI model and training code and data also free?

Added: it looks like it uses Whisper for transcription. So the inference code is there but it's unclear about the other stuff.

https://en.m.wikipedia.org/wiki/Whisper_(speech_recognition_system)

Anyway, thanks for the update.

[–] [email protected] 4 points 3 weeks ago (1 children)

Unless something has changed, it did. The page linked reads:

And, obviously, this POC is open source, the code is publish here on our forge.

The link takes you to their repos. The server repo has instructions on self-hosting directly on your server or with Docker. The app repo has code for both the iOS and Android apps. That’s good, because the iOS app at least doesn’t have a built-in way to select a different backend server.

Whisper is by OpenAI and as far as I know they have not shared the training code, much less the data sets, so the best you can do is fine-tune the models they’ve provided.

If use of Whisper is a problem, but the project is otherwise interesting to you, you could ask them to consider using a different STT solution (or allowing the user to choose between different options). I’m not aware of any fully open STT applications that are considered to be as capable as Whisper, but if you do, that would be great info to share with them.

[–] [email protected] 3 points 3 weeks ago* (last edited 3 weeks ago)

I get the impression that recreating the Whisper training code is possibly doable, but the data is a bigger task.

This is a possible Whisper alternative with maybe similar issues: https://petewarden.com/2024/10/21/introducing-moonshine-the-new-state-of-the-art-for-speech-to-text/

Yes it's nice that the phone app is free but STT is the difficult and important part. With Moonshine it might be possible to run the transcriber completely on the phone instead of having the STT on a remote server.

It's interesting that they are able to do all that speaker distinguishing with just a single mic as found on both phones. There was a thread about phone features recently. Given this STT stuff, it could be useful to have a phone with 3 or 4 mics in the corners of the phone, like one of those tabletop conference mics, so it can figure out directionality of sound sources.