this post was submitted on 22 Feb 2024
177 points (97.8% liked)

BecomeMe

805 readers
1 users here now

Social Experiment. Become Me. What I see, you see.

founded 1 year ago
MODERATORS
all 30 comments
sorted by: hot top controversial new old
[–] [email protected] 7 points 8 months ago

Were they right to kick him out?

[–] [email protected] 2 points 8 months ago

Yet another reason to have nothing to do with OpenAI.

[–] [email protected] 3 points 8 months ago

And yet Reddit has just done a data extraction deal with Google

[–] [email protected] 11 points 8 months ago (1 children)
[–] [email protected] 7 points 8 months ago (1 children)

Fancy copy-paste bot go brrrrr

[–] [email protected] 3 points 8 months ago

It's learning NLP and how to build dynamic web apps at the same time!

[–] [email protected] 17 points 8 months ago (1 children)

He was the CEO of reddit for a few months a decade back or so. They probably tossed a bucket of worthless stock at him zoidberg style.

[–] [email protected] 2 points 8 months ago

You think he waved a sandwich in front of them?

[–] [email protected] 5 points 8 months ago

In today’s photo, Mr AI shows off his “Wallace and Gromit” look.

[–] [email protected] 26 points 8 months ago (1 children)

The infringement runs deep, I didn't know my reddit comments were directly going to AI. Had I known that, I wouldn't have posted so much.

They encouraged me to give more than I would have, for their commercial purposes, without due consideration.

[–] [email protected] 13 points 8 months ago (1 children)

Lemmy is also being scraped for training and data collection I bet.

[–] [email protected] 17 points 8 months ago

Everything is scraped for training data. They argue that this is fair use under the "research" exemption. However, it is not research, the datasets they build are private and used exclusively for commercial product development.

Even if you could consider it as fair use research - which it isn't - the commerciality of it should exclude it from being fair use.

[–] [email protected] 13 points 8 months ago (1 children)
[–] [email protected] 63 points 8 months ago (4 children)

Makes sense. Reddit really is a data gold mine. I shudder to think about the profile you could have built of me from my up/down votes alone, much less my comments and posts. FOMO and participation always generating data really just has me wondering if we'll see a world where "privacy" becomes extinct or culturally different.

[–] [email protected] 2 points 8 months ago

Just wait until all humans awaken their psionic powers and everyone gets telepathy. Privacy doomers will seethe 😂

[–] [email protected] 41 points 8 months ago (1 children)

Somebody posted a screenshot where reddit accounts were getting "exclusive offers" to buy stock at the IPO...

Which means everyone that buys even a single share just tied their real life identity explicitly to their reddit account.

Having that makes their data on you a lot more valuable.

[–] [email protected] 20 points 8 months ago* (last edited 8 months ago)

Fuck I didn't even think of that side of it. I just thought "that's a bad investment" and left it at that

[–] [email protected] 8 points 8 months ago (1 children)

You should be paid for that. There is a valid lawsuit to be had. Reddit's terms and conditions do not absolve them of value theft from the content you posted.

[–] [email protected] 10 points 8 months ago* (last edited 8 months ago) (1 children)

Ehhh, I'm not super persuaded by that argument tbh. I don't think arbitrary data has any intrinsic value on its own the way copyright-able art does. I work in tech and I'm just not sure I want that can of worms opened.

Edit: I guess I should say, it's not something I've dedicated a lot of thought too. I'm open to arguments to the contrary.

[–] [email protected] 4 points 8 months ago (2 children)

I don’t think arbitrary data has any intrinsic value on its own the way copyright-able art does.

If you make a thing, copyright is intrinsically attributed to it. That's what copyright law states.

Registering for copyright is simply the requirement to claim damages beyond the direct losses.

If you write a comment on a website, you have copyright for that comment. However, the terms of the website state that you give them extensive rights to your comment. Obstensibly, this is so that the website can operate normally, however the rights they claim extend far beyond that.

Such extrenuous rights are granted with no consideration. You have already accessed the website, you have already started making a comment. The content of your comment, if valueable, does not yield you any form of consideration (pay). As such, the rights they claim are not warranted.

Now that the value of user comments has increased so grossly with AI, the website's claim to rights should be even weaker. Reddit is being paid millions for the comments they hold - and, frankly, they are selling it cheap. This value rightfully belongs to the users that made the comments, not to the business that attempted to squirrel away a transaction in the fine print.

[–] [email protected] 2 points 8 months ago (1 children)

I think that's fair-ish. I still think that most companies are going to argue that access to their platform is predicated on access to your data, given that it would not exist without the platform. The argument that they have some stake is at least valid on the face, particularly as it is a voluntary agreement the user enters when they engage with a product.

[–] [email protected] 2 points 8 months ago

Companies have made a tradition of arguing it is their god-given right to make profit. That does not fit with reality.

The core principles of contract law dictate how transactions occur. When you buy something, the price on the shelf is not an "offer", but an "invitation to treat"; you make the offer when you approach the owner and say "I will pay this amount to buy this thing from you". The price on the shelf is simply the seller saying "If you offer this amount, I will most likely accept" - but the seller retains the right to refuse any offer you make.

With a website, the offer is made by the website owner at the point of entry. The offer is: "you can enter our site, free of charge!!", but then there are terms attached. The terms should only be a technicality, something like "don't be a dick while you're here, else you can be kicked out". But, instead, they have put in a second transaction, one which is akin to saying "we can rummage through your wallet and copy anything we find, and we can then sell copies of all of that, and you agree to give us that for free. Also, anything you say inside will be recorded and copied and sold in a similar manner".

This is patently bullshit. An offer is "you give us X, we give you Y" and then the terms and conditions provide the details and limitations of that core offer. If you compare it to insurance, an insurer is required to give you some sort of "key facts page", wherein they detail the key points of what they're offering you in exchange for your insurance premium (the money). There is a clear exchange. With websites and software, there is often no clear exchange beyond the use of the software being free.

And this is before you consider cunts like Microsoft, who have seen what Google and Facebook do, yet you still pay for Windows and Office 365 while they steal your data for no consideration!

It's fucking deceptive, and intentionally so. It has built up from a time when user data had no tangible value - and yet, the companies that collected this data (eg Google and Facebook) somehow managed to use this data to become amongst the wealthiest businesses in the world.

They have done this through fraudulent and unlawful means. To this date, they have not been challenged on this method. Now, we have a situation where "everyone does it" - but that doesn't make it legal, and furthermore even more people are the victim of this offense. Lawmakers are the victims. It's simply that people haven't yet realised that they're the victim, and in particular they haven't yet understood the value that is being taken.

Like I say, $50 per year, minimum. Likely far more, approaching or maybe even exceeding $1,000 per year. From everyone - and your data is more valuable if you're more publically prominent. It's fucking criminal and needs to be sorted out. You can't build a car without paying for the nuts and bolts, and as such Facebook and Google have a huge debt to pay.

Every time you receive an SMS message to verify your identity or authorise a transaction, you are confirming your phone number to multiple people, thereby giving them a fresh piece of data to sell. This doesn't make you more secure - if anything, it weakens your consumer rights, as you are explicitly authorising the transaction rather than having it processed as "cardholder not present" where the seller assumes default liability in case of fraud. Even banks are in on the game, these days, and they sell away your consumer rights while telling you it's good for you.

[–] [email protected] 2 points 8 months ago (1 children)

This is a good point. Of course in the past you could just overhear someone say something and use that info for free. Like turn it into a song lyric or whatever. Even acting directly on the info could make you money if you overheard a stock tip.

Now comments online are directly attributable to a particular person. These LLM programs are like someone listening and watching whatever you're doing. Then claiming those are their ideas.

Overhearing something by accident is one thing. Actively recording someone's conversation with the intent to profit from it is another. Although I think that most of the info they'll get is nonsense, so whatever. But it's still not really their creation. It's the commenters'.

[–] [email protected] 2 points 8 months ago (1 children)

Of course in the past you could just overhear someone say something and use that info for free

But that's the thing, either the information is in the public domain - in which case it is freely available and cannot be sold to anyone else - or the information is private. It can't be both. They can't say "you posted it on our platform, so it's public and we don't have to pay you" while simultaneously selling it to someone else.

If they're selling it, then the author has a fair claim to it. Terms and conditions won't hold water if no consideration has been given, and "free access to a website" does not meet that bar - especially when the access is granted regardless of whether you make the post.

[–] [email protected] 2 points 8 months ago

You can take something from the public domain and sell it. There's plenty of public domain books sold. It just means that everyone can access it and use it equally. You can record everything you overhear at the store and use it for writing a story or whatever.

Although what you say sounds true. These comments are written and signed, easily attributed to the owner. Every document like that in the past has been the property of the writer. Same thing with images. I don't know how Reddit can claim any comments are legally theirs without "consideration" as you say.

[–] [email protected] 18 points 8 months ago (1 children)

Yeah kinda the reason I stopped contributing even with votes. No more account login, just casually browse/lurk. Even if you periodically delete your account all that data is probably saved somewhere, there's no way to be sure.

[–] [email protected] 6 points 8 months ago

Yeah, I'm doing it again on lemmy. I'm definitely not immune in any way to the FOMO and lack of data discipline. We all are, in some ways, even just by being here on his platform.

[–] [email protected] 2 points 8 months ago

This is the best summary I could come up with:


The Reddit IPO appears set to move forward, with the tech platform filing its form S-1 with the Securities and Exchange Commission on Thursday.

The form lists a variety of new details about the site, including its financials, risk factors, and key business lines.

“Our content is particularly important for artificial intelligence (‘AI’) – it is a foundational part of how many of the leading large language models (‘LLMs’) have been trained,” the company writes in the S-1.

We expect our data advantage and intellectual property to continue to be a key element in the training of future LLMs.”

News reports have pegged the partner as Google, which is using Reddit data to train its Gemini LLM.

According to the filing, Huffman’s total compensation in 2023 was $193.2 million, though that was almost all in the form of stock and option awards, which may or may not vest, depending on the company’s performance.


The original article contains 443 words, the summary contains 153 words. Saved 65%. I'm a bot and I'm open source!