sodalite

joined 1 year ago
[–] [email protected] 3 points 3 days ago (1 children)

if it's impossible to train AI without abusing copyright why do I see job postings looking for writers to train AI 🤔

[–] [email protected] 2 points 5 days ago

awesome, thanks!

[–] [email protected] 4 points 5 days ago (2 children)

now that I have been spoiled by floorp with workspaces, I have to ask... does it do workspaces? I can't find it on the site if it does

[–] [email protected] 4 points 5 days ago (2 children)

Have A Nice Life - Deathconsciousness

[–] [email protected] 1 points 1 week ago

yeah i remember when that article came out in 2023

[–] [email protected] 8 points 1 week ago (2 children)

I see no mention of ActivityPub in the article, but I'm wondering if this is part of their plan to eventually integrate Tumblr into the fediverse as well.

However I agree with others that this will likely result in hella janky hackable websites first...hopefully it smoothes out.

 

Looking for new audio dramas to get into but it seems like r/audiodrama still has the most crowd-sourced references when it comes to that. I don't really wanna go to to Reddit if i can avoid it, but not sure if this community is the right place or if a whole new audio drama community should be made. Probably the latter.

For now, what are some recent audio dramas you would recommend for a fan of things like The White Vault, Wolf 359, Marsfall, Malevolent, Vampire The Masquerade: Port Saga, Old Gods of Appalachia, Kakos Industries type stuff?

[–] [email protected] 7 points 1 week ago (1 children)
[–] [email protected] 3 points 2 weeks ago

i used wordpress for over a decade but left last year when the AI hype started ramping up and there were reports that even selecting the option to turn off post scraping doesn't work because the bots blocked by the site's robots.txt doesn't update.

basically: wordpress doesn't let you control AI scraping so is it still the best platform those of us worried about that are gonna be able to find?

[–] [email protected] 5 points 2 weeks ago

there is such s thing as solutions journalism, but it is rather rare

https://www.solutionsjournalism.org/

 

Tumblr and Wordpress are preparing to sell user data to Midjourney and OpenAI, according to a source with internal knowledge about the deals and internal documentation referring to the deals.

The exact types of data from each platform going to each company are not spelled out in documentation we’ve reviewed, but internal communications reviewed by 404 Media make clear that deals between Automattic, the platforms’ parent company, and OpenAI and Midjourney are imminent.

The internal documentation details a messy and controversial process within Tumblr itself. One internal post made by Cyle Gage, a product manager at Tumblr, states that a query made to prepare data for OpenAI and Midjourney compiled a huge number of user posts that it wasn’t supposed to. It is not clear from Gage’s post whether this data has already been sent to OpenAI and Midjourney, or whether Gage was detailing a process for scrubbing the data before it was to be sent.

Gage wrote:

“the way the data was queried for the initial data dump to Midjourney/OpenAI means we compiled a list of all tumblr’s public post content between 2014 and 2023, but also unfortunately it included, and should not have included:

  • private posts on public blogs
  • posts on deleted or suspended blogs
  • unanswered asks (normally these are not public until they’re answered)
  • private answers (these only show up to the receiver and are not public)
  • posts that are marked ‘explicit’ / NSFW / ‘mature’ by our more modern standards (this may not be a big deal, I don’t know)
  • content from premium partner blogs (special brand blogs like Apple’s former music blog, for example, who spent money with us on an ad campaign) that may have creative that doesn’t belong to us, and we don’t have the rights to share with this-parties; this one is kinda unknown to me, what deals are in place historically and what they should prevent us from doing.”

Gage’s post makes clear that engineers are working on compiling a list of post IDs that should not have been included, and that password-protected posts, DMs, and media flagged as CSAM and other community guidelines violations were not included.

Automattic plans to launch a new setting on Wednesday that will allow users to opt-out of data sharing with third parties, including AI companies, according to the source, who spoke on the condition of anonymity, and internal documents. A new FAQ section we reviewed is titled “What happens when you opt out?” states that “If you opt out from the start, we will block crawlers from accessing your content by adding your site on a disallowed list. If you change your mind later, we also plan to update any partners about people who newly opt-out and ask that their content be removed from past sources and future training.”

404 Media has asked Automattic how it accidentally compiled data that it shouldn’t share, and whether any of that content was shared with OpenAI. 404 Media asked Automattic about an imminent deal with Midjourney last week but did not hear back then, either. Instead of answering direct questions about these deals and the compiling of user data, Automattic sent a statement, which it posted publicly after this story was published, titled "Protecting User Choice." In it, Automattic promises that it's blocked AI crawlers from scraping its sites. The statement says, "We are also working directly with select AI companies as long as their plans align with what our community cares about: attribution, opt-outs, and control. Our partnerships will respect all opt-out settings. We also plan to take that a step further and regularly update any partners about people who newly opt out and ask that their content be removed from past sources and future training."

Another internal document shows that, on February 23, an employee asked in a staff-only thread, “Do we have assurances that if a user opts out of their data being shared with third parties that our existing data partners will be notified of such a change and remove their data?”

Andrew Spittle, Automattic’s head of AI replied: “We will notify existing partners on a regular basis about anyone who's opted out since the last time we provided a list. I want this to be an ongoing process where we regularly advocate for past content to be excluded based on current preferences. We will ask that content be deleted and removed from any future training runs. I believe partners will honor this based on our conversations with them to this point. I don't think they gain much overall by retaining it.” Automattic did not respond to a question from 404 Media about whether it could guarantee that people who opt out will have their data deleted retroactively.

News about a deal between Tumblr and Midjourney has been rumored and speculated about on Tumblr for the last week. Someone claiming to be a former Tumblr employee announced in a Tumblr blog post that the platform was working on a deal with Midjourney, and the rumor made it onto Blind, an app for verified employees of companies to anonymously discuss their jobs. 404 Media has seen the Blind posts, in which what seems like an Automattic employee says, “I'm not sure why some of you are getting worked up or worried about this. It's totally legal, and sharing it publicly is perfectly fine since it's right there in the terms & conditions. So, go ahead and spread the word as much as you can with your friends and tech journalists, it's totally fine.”

Separately, 404 Media viewed a public, now-deleted post by Gage, the product manager, where he said that he was deleting all of his images off of Tumblr, and would be putting them on his personal website. A still-live post says, “i've deleted my photography from tumblr and will be moving it slowly but surely over to cylegage.com, which i'm building into a photography portfolio that i can control end-to-end.” At one point last week, his personal website had a specific note stating that he did not consent to AI scraping of his images. Gage’s original post has been deleted, and his website is now a blank page that just reads “Cyle.” Gage did not respond to a request for comment from 404 Media.

Several online platforms have made similar deals with AI companies recently, including Reddit, which entered into an AI content licensing deal with Google and said in its SEC filing last week that it’s “in the early stages of monetizing [its] user base” by training AI on users’ posts. Last year, Shutterstock signed a six year deal with OpenAI to provide training data.

OpenAI and Midjourney did not respond to requests for comment.

 
view more: next ›