- cross-posted to:
- privacy@lemmy.ml
5
- cross-posted to:
- privacy@lemmy.ml
:pona_plush: #FediPact :pona_plush: (@FediPact@cyberpunk.lol)
cyberpunk.lol# INSTANCES KNOWN TO HAVE BEEN SCRAPED BY META INCLUDE:
• mastodon.social
• mastodon.online
• tech.lgbt
• hackers.town
• chaos.social
• mastodon.org.uk
• mastodont.cat
• mastodon.de
• mastodon.xyz
• mastodon.coffee
• mastodon.cloud
• mastodon.scot
• mastodonapp.uk
• mastodon.green
• mastodon.ml
• mastodon.au
• mastodon.eus
• mastodonczech.cz
• mastodon.sdf.org
• mstdn.social
• troet.cafe
• techhub.social
• tchncs.de
• kolektiva.social
• mamot.fr
• defcon.social
• meow.social
• social.linux.pizza
• ioc.exchange
• eldritch.cafe
• yiff.life
• furry.engineer
• infosec.exchange
• blahaj.zone
• woof.group
• union.place
• queer.party
• sakurajima.moe
• pawb.social
• digipres.club
• journa.host
• corteximplant.net
• corteximplant.com
• octodon.social
• bitbang.social
• jorts.horse
• tenforward.social
• pnw.zone
• spore.social
• hear-me.social
• neuromatch.social
• vt.social
• cosocial.ca
• chitter.xyz
• tooter.social
• cloudisland.nz
• social.seattle.wa.us
• masto.es
• nobigtech.es
• mastodon.gal
• masto.host
• toot.community
• pony.social
• climatejustice.global
• pleroma.envs.net
• indiepocalypse.social
• anarchism.space
• disroot.org
• dragonscave.space
• toot.bike
• fuzzies.wtf
• norden.social
• beige.party
• ohai.social
• freeradical.zone
• metalhead.club
• treehouse.systems
• icosahedron.website
• sunbeam.city
• sunny.garden
• zeroes.ca
• ursal.zone
• chaosfem.tw
• mas.to
• mathstodon.xyz
• rubber.social
• todon.nl
• cupoftea.social
• nerdculture.de
• toad.social
there're definitely more, i just did ctrl+f when i thought of an instance name so i definitely missed some. will be editing this list to add them as i think of them
#FediPact #meta #threads
I don’t see why everyone’s surprised about this. The Fediverse is running on ActivityPub, an open protocol whose purpose is to broadcast the content we post here to anyone who wants it. Of course it’s being used to train AI, why wouldn’t it?
Except iirc, they aren’t scraping “properly” (read: efficiently at least, setting aside morality for the sake of discussing this component in isolation), and are causing traffic troubles. If only they took the time to install an actual instance themselves then nobody would care in the slightest (again, ignoring the morality part, for now).
TLDR: they are being dicks about it, bc offering everything we have for free is not enough for them.
of all the scrapers we see, the requests identified as originating from Meta seem to be well behaved overall. they appear to (mostly) be respecting robots.txt where present and their request volume to Lemmy.World is only averaging slightly above 5 requests per minute over the last 2 weeks. they also don’t spoof their user agents to pretend to be web browsers, or at least I have not seen credible accusations of this happening.
But if they do it the “proper” way, they won’t be able to grab the data if instances defederate from them, right? And that’s what the majority of instances will do.
Assuming you know which instances are the ones they’re collecting data from. It could be any instance.