Astronomer & video game data scientist with repressed anger

  • 0 Posts
  • 3 Comments
Joined 1Y ago
cake
Cake day: Jun 02, 2023

help-circle
rss

The expectation that things are not private is totally different from the expectation that things are not being harvested for profit, though. Harvesting things for profit is transforming the public into the private.


LLMs create statistical distributions of words and phrases based on ingested data, and then sample those distributions given conditional probabilities.

Why should for-profit companies have the right to create these statistical distributions based on our written works without consent? They’re not publishing these distributions, and the purpose of ingesting these texts is not to report on the distributions.

They’re just bottom-trawling the internet and acting as if they have every right to use other peoples’ written works. While people are having “serious discussions” around it, they’re moving forward, ignoring the discussions entirely, and trying to force the conclusion of those discussions to be “well, it’s too late now, anyway”.


Could somebody explain why this is bad?

Consent.

I don’t consent to my copyrighted material – which is literally everything I write and post online, including this comment – being included in these products. In some cases, I have implicitly consented to allowing this to happen via the EULA of websites I’ve used over the years, but having them actively scraping the web for content means they’re directly bypassing any agreements I may have made with service providers, and that they’re collecting my copyrighted works without my ever having done business of any sort with them.

I haven’t agreed to contribute to their for-profit operation, I’m not being compensated in any way for this participation – whether financially or via the providing of a service – and I don’t believe they have any moral right to decide that I’m going to contribute whether I want to or not.

They can fuck right off.