QuadratureSurfer

joined 1 year ago
[–] [email protected] 1 points 7 hours ago

Ok, but the most important part of that research paper is published on the github repository, which explains how to provide audio data and text data to recreate any STT model in the same way that they have done.

See the "Approach" section of the github repository: https://github.com/openai/whisper?tab=readme-ov-file#approach

And the Traning Data section of their github: https://github.com/openai/whisper/blob/main/model-card.md#training-data

With this you don't really need to use the paper hosted on arxiv, you have enough information on how to train/modify the model.

There are guides on how to Finetune the model yourself: https://huggingface.co/blog/fine-tune-whisper

Which, from what I understand on the link to the OSAID, is exactly what they are asking for. The ability to retrain/finetune a model fits this definition very well:

The preferred form of making modifications to a machine-learning system is:

  • Data information [...]
  • Code [...]
  • Weights [...]

All 3 of those have been provided.

[–] [email protected] 0 points 14 hours ago (3 children)

I don't understand. What's missing from the code, model, and weights provided to make this "open source" by the definition of your first link? it seems to meet all of those requirements.

As for the OSAID, the exact training dataset is not required, per your quote, they just need to provide enough information that someone else could train the model using a "similar dataset".

[–] [email protected] 0 points 18 hours ago (5 children)

I did a quick check on the license for Whisper:

Whisper's code and model weights are released under the MIT License. See LICENSE for further details.

So that definitely meets the Open Source Definition on your first link.

And it looks like it also meets the definition of open source as per your second link.

Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of the paper, as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.

[–] [email protected] 8 points 1 day ago (7 children)

The STT (speech to text) model that they created is open source (Whisper) as well as a few others:

https://github.com/openai/whisper

https://github.com/orgs/openai/repositories?type=all

[–] [email protected] 7 points 1 day ago

They also gave him a fake line to deliver and didn't reveal that Darth Vader was actually Luke's father during the filming of that scene: https://www.soundandvision.com/news/100104hamill/

It's such a great moment! The fake line that was put in there just to try and keep the secret was "You don't know the truth: Obi-Wan killed your father!" But as much as I enjoyed leaking false information, it was a wonderfully hard secret to keep because (Irvin) Kershner, the director, brought me aside and said "Now I know this, and George knows this, and now you're going to know this, but if you tell anybody, and that means Carrie or Harrison, or anybody, we're going to know who it is because we know who knows.". -Mark Hamill

[–] [email protected] 0 points 2 days ago (10 children)

I personally like the fan theory that Padme's life force was transferred to Anakin with the help of Palpatine.

https://retrozap.com/padme-didnt-die-of-a-broken-heart/

[–] [email protected] 10 points 2 days ago

I initially think this same thing every time I see someone mention MTG on here, glad I'm not the only one.

[–] [email protected] 3 points 2 days ago

I don't think this is specifically an "AI" problem as much as it's a privacy issue with the way companies are buying and selling our info for targeted advertising. These models are definitely enabling them to do more with the data that they have as well as to collect more information from us in new ways.

[–] [email protected] 1 points 2 days ago* (last edited 2 days ago)

Yeah, the other thing I could see happening is a similar tactic used by scammers where they use Mules who pick up mail from various Airbnbs throughout whatever country, but this would definitely limit most bot operations... Unless some organization specializes in this and just offers some service to create a bunch of accounts for anyone willing to pay.

Also, how many accounts would you limit to a single address, and how long would you lock up an address before it could be used again (given that people do move around from time to time).

edit:typo.

[–] [email protected] 1 points 2 days ago (2 children)

That's a good point. I didn't know about the USPS Form 1583 for virtual mailboxes... Although that is a U.S. specific thing, so finding a similar service in a country that doesn't care so much might be the way to go about that.

[–] [email protected] 6 points 2 days ago

Yep, exactly this. It might deter some small time bot creators, but it won't stop larger operations and may even help them to seem more legitimate.

If anything, my favorite idea comes from this xkcd:

https://xkcd.com/810/

[–] [email protected] 3 points 2 days ago (7 children)

Easy way to get around that with "virtual" addresses: https://ipostal1.com/virtual-address.php

Just pay $10 for every account that you want to create.... you may as well just go with the solution of charging everyone $10 to create an account. At least that way the instance owner is getting supported and it would have the same effect.

 
 
 

cross-posted from: https://lemmy.zip/post/20089895

 

I just noticed that crossposts like this one: https://lemmy.ca/post/23884170 do not show the text from the original post.

Viewing the crosspost on a browser displays it, but on Voyager it only appears if I click on the original post.

 

cross-posted from: https://discuss.tchncs.de/post/17734195

What's the reason for the empty comments, lately?

In the last few weeks, I frequently see some empty comments. It's just the username and no text beneath.

Is there a deeper reason behind this? Do people nowadays strip away the text instead of deleting a comment? Or did some script surface that 'makes the internet forget'? First I thought people did this before deleting a comment and the deletion just didn't get federated. But I scrolled through some older posts and they also still have comments like that, so that can't be it. Right?

Can anyone educate me?

view more: next ›