Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

misk@sopuli.xyz · 2 years ago

Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

schnurrito@discuss.tchncs.de · 2 years ago

Messages that people post on Stack Exchange sites are literally licensed CC-BY-SA, the whole point of which is to enable them to be shared and used by anyone for any purpose. One of the purposes of such a license is to make sure knowledge is preserved by allowing everyone to make and share copies.

9point6@lemmy.world · 2 years ago

Share Alike

I can’t wait to download my own version of the latest gpt model

bbuez@lemmy.world · 2 years ago

It does help to know what those funny letters mean. Now we wait for regulators to catch up…

/tangent

If anything, we’re a very long way from anything close to intelligent, OpenAI (and subsequently MS, being publicly traded) sold investors on the pretense that LLMs are close to being “AGI” and now more and more data is necessary to achieving that.

If you know the internet, you know there’s a lot of garbage. I for one can’t wait for garbage-in garbage-out to start taking its toll.

Also I’m surprised how well open source models have shaped up, its certainly worth a look. I occasionally use a local model for “brainstorming” in the loosest terms, as I generally know what I’m expecting, but it’s sometimes helpful to read tasks laid out. Also comfort in that nothing even need leave my network, and even in a pinch I got some answers when my network was offline.

It gives a little hope while corps get to blatantly violate copyright while having wielding it so heavily, that advancements have been so great in open source.

kerrigan778@lemmy.world · 2 years ago

That license would require chatgpt to provide attribution every time it used training data of anyone there and also would require every output using that training data to be placed under the same license. This would actually legally prevent anything chatgpt created even in part using this training data from being closed source. Assuming they obviously aren’t planning on doing that this is massively shitting on the concept of licensing.

JohnEdwa@sopuli.xyz · 2 years ago

CC attribution doesn’t require you to necessarily have the credits immediately with the content, but it would result in one of the world’s longest web pages as it would need to have the name of the poster and a link to every single comment they used as training data, and stack overflow has roughly 60 million questions and answers combined.

fruitycoder@sh.itjust.works · 2 years ago

IF its outputs are considered derivative works.

theherk@lemmy.world · 2 years ago

Maybe but I don’t think that is well tested legally yet. For instance, I’ve learned things from there, but when I share some knowledge I don’t attribute it to all the underlying sources of my knowledge. If, on the other hand, I shared a quote or copypasta from there I’d be compelled to do so I suppose.

I’m just not sure how neural networks will be treated in this regard. I assume they’ll conveniently claim that they can’t tie answers directly to underpinning training data.