Why are anime catgirls blocking my access to the Linux kernel?

tofu@lemmy.nocturnal.garden · 2 months ago

Why are anime catgirls blocking my access to the Linux kernel?

rtxn@lemmy.world · edit-2 2 months ago

The current version of Anubis was made as a quick “good enough” solution to an emergency. The article is very enthusiastic about explaining why it shouldn’t work, but completely glosses over the fact that it has worked, at least to an extent where deploying it and maybe inconveniencing some users is preferable to having the entire web server choked out by a flood of indiscriminate scraper requests.

The purpose is to reduce the flood to a manageable level, not to block every single scraper request.

poVoq@slrpnk.net · edit-2 2 months ago

And it was/is for sure the lesser evil compared to what most others did: put the site behind Cloudflare.

I feel people that complain about Anubis have never had their server overheat and shut down on an almost daily basis because of AI scrapers 🤦

tofu@lemmy.nocturnal.garden · edit-2 2 months ago

Yeah, I’m just wondering what’s going to follow. I just hope everything isn’t going to need to go behind an authwall.

rtxn@lemmy.world · 2 months ago

The developer is working on upgrades and better tools. https://xeiaso.net/blog/2025/avoiding-becoming-peg-dependency/

grysbok@lemmy.sdf.org · 2 months ago

I’ll say the developer is also very responsive. They’re (ambiguous ‘they’, not sure of pronouns) active in a libraries-fighting-bots slack channel I’m on. Libraries have been hit hard by the bots: we have hoards of tasty archives and we don’t have money to throw resources at the problem.

lilith267@lemmy.blahaj.zone · 2 months ago

The Anubis repo has an enbyware emblem fun fact :D

grysbok@lemmy.sdf.org · 2 months ago

Yay! I won’t edit my comment (so your comment will make sense) but I checked and they also list they/them on their github profile

tofu@lemmy.nocturnal.garden · 2 months ago

Cool, thanks for posting! Also the reasoning for the image is cool.

mobotsar@sh.itjust.works · 2 months ago

Is there a reason other than avoiding infrastructure centralization not to put a web server behind cloudflare?

poVoq@slrpnk.net · 2 months ago

Yes, because Cloudflare routinely blocks entire IP ranges and puts people into endless captcha loops. And it snoops on all traffic and collects a lot of metadata about all your site visitors. And if you let them terminate TLS they will even analyse the passwords that people use to log into the services you run. It’s basically a huge survelliance dragnet and probably a front for the NSA.

Björn@swg-empire.de · 2 months ago

Cloudflare would need https keys so they could read all the content you worked so hard to encrypt. If I wanted to do bad shit I would apply at Cloudflare.

mobotsar@sh.itjust.works · edit-2 2 months ago

Maybe I’m misunderstanding what “behind cloudflare” means in this context, but I have a couple of my sites proxied through cloudflare, and they definitely don’t have my keys.

I wouldn’t think using a cloudflare captcha would require such a thing either.

StarkZarn@infosec.pub · 2 months ago

That’s because they just terminate TLS at their end. Your DNS record is “poisoned” by the orange cloud and their infrastructure answers for you. They happen to have a trusted root CA so they just present one of their own certificates with a SAN that matches your domain and your browser trusts it. Bingo, TLS termination at CF servers. They have it in cleartext then and just re-encrypt it with your origin server if you enforce TLS, but at that point it’s meaningless.

mobotsar@sh.itjust.works · 2 months ago

Oh, I didn’t think about the fact that they’re a CA. That’s a good point; thanks for the info.

Björn@swg-empire.de · edit-2 2 months ago

Hmm, I should look up how that works.

Edit: https://developers.cloudflare.com/ssl/origin-configuration/ssl-modes/#custom-ssltls

They don’t need your keys because they have their own CA. No way I’d use them.

Edit 2: And with their own DNS they could easily route any address through their own servers if they wanted to, without anyone noticing. They are entirely too powerful. Is there some way to prevent this?

moseschrute@crust.piefed.social · 2 months ago

Out of curiosity, what’s the issue with Cloudflair? Aside from the constant worry they may strong arm you into their enterprise pricing if you’re site is too popular lol. I understand support open source, but why not let companies handle the expensive bits as long as they’re willing?

I guess I can answer my own question. If the point of the Fediverse is to remove a single point of failure, then I suppose Cloidflare could become a single point to take down the network. Still, we could always pivot away from those types of services later, right?

Limonene@lemmy.world · 2 months ago

Cloudflare has IP banned me before for no reason (no proxy, no VPN, residential ISP with no bot traffic). They’ve switched their captcha system a few times, and some years it’s easy, some years it’s impossible.

daniskarma@lemmy.dbzer0.com · 2 months ago

I still think captchas are a better solution.

In order to surpass them they have to run AI inference which is also comes with compute costs. But for legitimate users you don’t run unauthorized intensive tasks on their hardware.

poVoq@slrpnk.net · 2 months ago

They are much worse for accessibility, and also take longer to solve and are more distruptive for the majority of users.

daniskarma@lemmy.dbzer0.com · edit-2 2 months ago

Anubis is worse for privacy. As you have to have JavaScript enabled. And worse for the environment as the cryptographic challenges with PoW are just a waste.

Also reCaptcha types are not really that disturbing most of the time.

As I said, the polite thing you just be giving users the options. Anubis PoW running directly just for entering a website is one of the most rudest piece of software I’ve seen lately. They should be more polite, and just give an option to the user, maybe the user could chose to solve a captcha or run Anubis PoW, or even just having Anubis but after a button the user could click.

I don’t think is good practice to run that type of software just for entering a website. If that tendency were to grow browsers would need to adapt and straight up block that behavior. Like only allow access to some client resources after an user action.

poVoq@slrpnk.net · edit-2 2 months ago

Are you seriously complaining about an (entirely false) negative privacy aspect of Anubis and then suggest reCaptcha from Google is better?

Look, no one thinks Anubis is great, but often it is that or the website becoming entirely inaccessible because it is DDOSed to death by the AI scrapers.

daniskarma@lemmy.dbzer0.com · 2 months ago

First, I said reCaptcha types, meaning captchas of the style of reCaptcha. That could be implemented outside a google environment. Secondly, I never said that types were better for privacy. I just said Anubis is bad for privacy. Traditional captchas that work without JavaScript would be the privacy friendly way.

Third, it’s not a false proposition. Disabling JavaScript can protect your privacy a great deal. A lot of tracking is done through JavaScript.

Last, that’s just the Anubis PR slogan. Not the truth, as I said ddos mitigation could be implemented in other ways. More polite and/or environmental friendly.

Are you astrosurfing for anubis? Because I really cannot understand why something as simple as a landing page with a button “run PoW challenge” would be that bad

poVoq@slrpnk.net · 2 months ago

Anubis is not bad for privacy, but rather the opposite. Server admins explicitly chose it over commonly available alternatives to preserve the privacy of their visitors.

If you don’t like random Javascript execution, just install an allow-list extension in your browser 🤷

And no, it is not a PR slogan, it is the live experience of thousands of server admins (me included) that have been fighting with this for month now and are very grateful that Anubis has provided some (likely only temporary) relief from that.

And I don’t get what the point of an extra button would be when the result is exactly the same 🤷

interdimensionalmeme@lemmy.ml · 2 months ago

Unless you have a dirty heatsink, no amount of hammering would make the server overheat

poVoq@slrpnk.net · 2 months ago

Are you explaining my own server to me? 🙄

interdimensionalmeme@lemmy.ml · 2 months ago

What CPU do you have made after 2004 that doesn’t have automatic temperature control ?
I don’t think there is any, unless you somehow managed to disable it ?
Even a raspberry pi without a heatsink won’t overheat to shutdown

poVoq@slrpnk.net · 2 months ago

You are right, it is actually worse, it usually just overloads the CPU so badly that it starts to throttle and then I can’t even access the server via SSH anymore. But sometimes it also crashes the server so that it reboots, and yes that can happen on modern CPUs as well.

interdimensionalmeme@lemmy.ml · 2 months ago

You need to set you http serving process to a priority below the administrative processes (in the place where you are starting it, so assuming linux server that would be your init script or systemd service unit).

Actual crash causing reboot ? Do you have faulty ram maybe ? That’s really not ever supposed to happen from anything happenning in userland. That’s not AI, your stuff might be straight up broken.

Only thing that isn’t broken that could reboot a server is a watchdog timer.

You server shouldn’t crash, reboot or become unreachable from the admin interface even at 100% load and it shouldn’t overheat either, temperatures should never exceed 80C no matter what you do, it’s supposed to be impossible with thermal management, which all processors have had for decades.

poVoq@slrpnk.net · 2 months ago

Great that this is all theoretical 🤷 My server hardware might not be the newest but it is definitly not broken.

And besides, what good is that you can still barely access the server through ssh, when the cpu is constantly maxed out and site visitors only get a timeout when trying to access the services?

I don’t even get what you are trying to argue here. That the AI scraper DDOS isn’t so bad because in theory it shouldn’t crash the server? Are you even reading what you are writing yourself? 🤡

AnUnusualRelic@lemmy.world · 2 months ago

The problem is that the purpose of Anubis was to make crawling more computationally expensive and that crawlers are apparently increasingly prepared to accept that additional cost. One option would be to pile some required cycles on top of what’s currently asked, but it’s a balancing act before it starts to really be an annoyance for the meat popsicle users.

rtxn@lemmy.world · 2 months ago

That’s why the developer is working on a better detection mechanism. https://xeiaso.net/blog/2025/avoiding-becoming-peg-dependency/

0_o7@lemmy.dbzer0.com · 2 months ago

The article is very enthusiastic about explaining why it shouldn’t work, but completely glosses over the fact that it has worked

This post was originally written for ycombinator “Hacker” News which is vehemently against people hacking things together for greater good, and more importantly for free.

It’s more of a corporate PR release site and if you aren’t known by the “community”, calling out solutions they can’t profit off of brings all the tech-bros to the yard for engagement.

loudwhisper@infosec.pub · 2 months ago

Exactly my thoughts too. Lots of theory about why it won’t work, but not looking at the fact that if people use it, maybe it does work, and when it won’t work, they will stop using it.

unexposedhazard@discuss.tchncs.de · 2 months ago

This… makes no sense to me. Almost by definition, an AI vendor will have a datacenter full of compute capacity.

Well it doesnt fucking matter what “makes sense to you” because it is working…
Its being deployed by people who had their sites DDoS’d to shit by crawlers and they are very happy with the results so what even is the point of trying to argue here?

daniskarma@lemmy.dbzer0.com · edit-2 2 months ago

It’s working because it’s not very used. It’s sort of a “pirate seagull” theory. As long a few people use it it works. Because scrappers don’t really count on Anubis so they don’t implement systems to surpass it.

If it were to become more common it would be really easy to implement systems that would defeat the purpose.

As of right now sites are ok because scrappers just send https requests and expect a full response. If someone wants to bypass Anubis protection they would need to take into account that they will receive a cryptographic challenge and have to solve it.

The thing is that cryptographic challenges can be very optimized. They are designed to run in a very inefficient environment as it is a browser. But if someone would take the challenge and solve it in a better environment using CUDA or something like that it would take a fraction of the energy defeating the purpose of “being so costly that it’s not worth scrapping”.

At this point it’s only a matter of time that we start seeing scrappers like that. Specially if more and more sites start using Anubis.

rtxn@lemmy.world · 2 months ago

New developments: just a few hours before I post this comment, The Register posted an article about AI crawler traffic. https://www.theregister.com/2025/08/21/ai_crawler_traffic/

Anubis’ developer was interviewed and they posted the responses on their website: https://xeiaso.net/notes/2025/el-reg-responses/

In particular:

Fastly’s claims that 80% of bot traffic is now AI crawlers

In some cases for open source projects, we’ve seen upwards of 95% of traffic being AI crawlers. For one, deploying Anubis almost instantly caused server load to crater by so much that it made them think they accidentally took their site offline. One of my customers had their power bills drop by a significant fraction after deploying Anubis. It’s nuts.

So, yeah. If we believe Xe, OOP’s article is complete hogwash.

tofu@lemmy.nocturnal.garden · 2 months ago

Cool article, thanks for linking! Not sure about that being a new development though, it’s just results, but we already knew it’s working. The question is, what’s going to work once the scrapers adapt?

Klear@quokk.au · edit-2 2 months ago

If that sounds familiar, it’s because it’s similar to how bitcoin mining works. Anubis is not literally mining cryptocurrency, but it is similar in concept to other projects that do exactly that

Did the author only now discover cryptography? It’s like a cryptocurrency, just without currency, what a concept!

SkaveRat@discuss.tchncs.de · 2 months ago

It’s a perfectly valid way to explain it, though

If you try to show up with “cryptography” as an explanation, people will think of encrypting messages, not proof of work

“Cryptocurrency with the currency” really is the perfect single sentence explanation

Chaotic Entropy@feddit.uk · 2 months ago

It’s quite similar.

Dremor@lemmy.world · 2 months ago

Anubis is no challenge like a captcha. Anubis is a ressource waster, forcing crawler to resolve a crypto challenge (basically like mining bitcoin) before being allowed in. That how it defends so well against bots, as they do not want to waste their resources on needless computing, they just cancel the page loading before it even happen, and go crawl elsewhere.

tofu@lemmy.nocturnal.garden · 2 months ago

No, it works because the scraper bots don’t have it implemented yet. Of course the companies would rather not spend additional compute resources, but their pockets are deep and some already adapted and solve the challenges.

Dremor@lemmy.world · 2 months ago

To solve it or not do not change that they have to use more resources for crawling, which is the objective here. And by contrast, the website sees a lot less load compared to before the use of Anubis. In any case, I see it as a win.

But despite that, it has its detractors, like any solution that becomes popular.

But let’s be honest, what are the arguments against it?
It takes a bit longer to access for the first time? Sure, but that’s not like you have to click anything or write anything.
It executes foreign code on your machine? Literally 90% of the web does these days. Just disable JavaScript to see how many website is still functional. I’d be surprised if even a handful does.

The only people having any advantages at not having Anubis are web crawler, be it ai bots, indexing bots, or script kiddies trying to find a vulnerable target.

Int32@lemmy.dbzer0.com · 2 months ago

I use uMatrix, which blocks js by default, so it is a bit inconvenient to have to enable js for some sites. websites which didn’t need it before, which is often the reason I use them, now require javascript.

tofu@lemmy.nocturnal.garden · 2 months ago

Sure, I’m not arguing against Anubis! I just don’t think the added compute cost is sufficient to keep them out once they adjust.

rumba@lemmy.zip · 2 months ago

Conceptually, you could just really twist the knobs up. A human can wait to read a page for 15 seconds. But you’re trying to scrape 100,000 pages and they each take 15 seconds… You can make it expensive in both power and time that’s a win.

daniskarma@lemmy.dbzer0.com · edit-2 2 months ago

I’m against it for several reasons. Running unauthorized heavy duty code on your end. It’s not JS in order to make your site functional, it’s heavy calculations unprompted. If they would add simple button “click to run challenge” would at least be more polite and less “malware-like”.

For some old devices the challenge last over 30 seconds, I can type a captcha in less time than that.

It blocks behind the necessity to use a browser several webs that people (like the article author) tend to browse directly from a terminal.

It’s a delusion. As shown by the article author solving the PoW challenge is not that much of an added cost. Span reduction would be the same with any other novel method, crawlers are just not prepared for it. Any prepared crawler would have no issues whatsoever. People are seeing results just because it’s obscurity, not because it really works as advertised. And in fact I believe some sites are starting to get crawled aggressively despite anubis as some crawlers are already catching up with this new Anubis trend.

Take into account that the challenge needs to be light enough so a good user can enter the website in a few seconds running the challenge on a browser engine (very inefficient). A crawler interested in your site could easily put up a solution to mine the PoW using CUDA in a GPU which would be hundreds if not thousands of times more efficient. So the balance of difficulty (still browsable for users but costly to crawl) is not feasible.

It’s not universally applicable. Imagine if all internet were behind PoW challenges. It would be like constant Bitcoin mining, a total waste of resources.

The company behind Anubis seems more shady to me each day. They feed on anti-AI paranoia, they didn’t even answer the article author valid critics when he email them, they use clearly PR language aimed to convince and please certain demographics to place their product. They are full of slogans but lack substance. I just don’t trust them.

Dremor@lemmy.world · edit-2 2 months ago

Fair point. I do agree with the “clic to execute challenge” approach.

For the terminal browser, it has more to do with it not respecting web standard than Anubis not working on it.

As for old hardware, I do agree that a temporization could be good idea, if it wasn’t so easy to circumvent. In such case bots would just wait in the background and resume once the timer is fullified, which would vastly decrease Anubis effectiveness as they don’t uses much power to do so. There isn’t really much that can be done here.

As for the CUDA solution, that will depend on the implemented hash algorithm. Some of them (like the one used by Monero) are made to vastly more inefficient on GPU than it is on the CPU. Moreover, GPU servers are far more expensive to run than CPU ones, so the result would be the same : crawling would be more expensive.

In any case, the best solution would be by far to make it a legal requirement to respect robot.txt, but for now the legislators prefer to look the other way.

Encrypt-Keeper@lemmy.world · edit-2 2 months ago

The point was never that Anubis challenges are something scrapers can’t get past. The point is it’s expensive to do so.

Some bots don’t use JavaScript and can’t solve the challenges and so they’d be blocked, but there was never any point in time where no scrapes could solve them.

JuxtaposedJaguar@lemmy.ml · 2 months ago

Wait, so browsers that disable JavaScript won’t be able to access those websites? Then I hate it.

Not everyone wants unauthenticated RCE from thousands of servers around the world.

Encrypt-Keeper@lemmy.world · 2 months ago

Not everyone wants unauthenticated RCE from thousands of servers around the world.

Ive got really bad news for you my friend

Possibly linux@lemmy.zip · edit-2 2 months ago

Anubis sucks

However, the number of viable options is limited.

seralth@lemmy.world · 2 months ago

Yeah but at least Anubis is cute.

I’ll take sucks but cute over dead internet and endless swarmings of zergling crawlers.

CommanderCloon@lemmy.ml · 2 months ago

What sucks about Anubis?

Possibly linux@lemmy.zip · 2 months ago

The implementation

It runs JavaScript and the actual algorithm could use improvement.

CrackedLinuxISO@lemmy.dbzer0.com · edit-2 2 months ago

There are some sites where Anubis won’t let me through. Like, I just get immediately bounced.

So RIP dwarf fortress forums. I liked you.

sem@lemmy.blahaj.zone · 2 months ago

I don’t get it, I thought it allows all browser with JavaScript enabled.

SL3wvmnas@discuss.tchncs.de · 2 months ago

I, too get blocked by certain sites. I think it’s a configuration thing, where it does not like my combination of uBlock/NoScript, even when I explicitly allow their scripts…

Lumisal@lemmy.world · 2 months ago

Have you tried accessing it by using Nyarch?

TwiddleTwaddle@lemmy.blahaj.zone · 2 months ago

I’m constantly unable to access Anubis sites on my primary mobile browser and have to switch over to Fennec.

olenko@feddit.nl · 2 months ago

I love that domain name.

katy ✨@piefed.blahaj.zone · 2 months ago

because anime catgirls are the best

2 months ago

@tofu@lemmy.nocturnal.garden

So (11508 websites * 2^16 sha256 operations) / 2^21, that’s about 6 minutes to mine enough tokens for every single Anubis deployment in the world.

IIRC Anubis does make token scoped to a specific IP so you will need more than that for a real distributed adversary, but yeah the math holds.

My solvers cranks out more than that in ~500ms …, faster than Anubis can verify at diff=4:blobcatdied:

daniskarma@lemmy.dbzer0.com · 2 months ago

Sometimes I think. Imagine if a company like google or facebook would implement something like anubis. And suddenly most people’s browsers would start solving cpu intensive constant cryptographic challenges. People would be outraged by the wasted energy. But somehow “cool small company” does it and it’s fine.

I do not think anubis system is sustainable for all the people to use it, it’s just too wasteful energy wise.

Tangent5280@lemmy.world · 2 months ago

What alternatives do you propose?

daniskarma@lemmy.dbzer0.com · edit-2 2 months ago

Captcha.

It does all Anubis does. If a scrapper wants to solve it automatically it’s computer intensive, they have to run AI inference, but for the user it’s just a little time consuming.

With captchas you don’t run aggressive software unauthorized on anyone’s computer.

Solution did exist. But Anubis is “trendy” and they are masters in PR within some specific circles of people who always wants the lastest most trendiest thing.

But good old captcha would achieve the same result as Anubis, in a more sustainable way.

Or at least give user an option of running or not running the challenge and leave the page. And make clear for the user that their hardware is going to run an intensive task. It really feels very aggressive to have a webpage to run basically a cryptominer unauthorized in your computer. And for me having a cargirl as a mascot does not forgive the rudeness of it.

tofu@lemmy.nocturnal.garden · 2 months ago

“good old captcha” is the most annoying thing ever for people and basically universally hated. Talking about leaving the page, what do you think what will cause more people to leave the page, a captcha that’s often broken or something where people don’t have to do anything but wait a little?

Randelung@lemmy.world · 2 months ago

Also universally useless. Image recognition solved Captcha ages ago and the new version from Google is literal spyware.

Chuppl has a great video essay on it. https://youtu.be/VTsBP21-XpI

daniskarma@lemmy.dbzer0.com · 2 months ago

They don’t have to do anything but let an unknown program to max their cpu unauthorized.

Imagine if google would implement that. Billions of computers running PoW constantly, what could go wrong?

tofu@lemmy.nocturnal.garden · 2 months ago

They don’t have to do anything but let an unknown program to max their cpu unauthorized.

But they currently can’t and that’s the point.

katy ✨@piefed.blahaj.zone · 2 months ago

but captcha is trash whose only purpose is to train ai for google

daniskarma@lemmy.dbzer0.com · edit-2 2 months ago

What?

You don’t need to use google, or cloudfare, captcha to have a captcha.

There are open source implementations of reCaptcha. And you can always run a classical captcha based on image recognition.

katy ✨@piefed.blahaj.zone · 2 months ago

google is like 95% of the captchas on the internet.

daniskarma@lemmy.dbzer0.com · 2 months ago

So? You have free will to use another captcha.

ryannathans@aussie.zone · 2 months ago

Yeah has seemed like a bit of a waste of time, once that difficulty gets scaled up and expiration down it’s gonna get annoying to use the web on phones

non_burglar@lemmy.world · 2 months ago

I had to get my glasses to re-read this comment.

You know why anubis is in place on so many sites, right? You are literally blaming the victims for the absolute bullshit AI is foisting on us all.

ryannathans@aussie.zone · 2 months ago

Yes, I manage cloudflare for a massive site that at times gets hit with millions of unique bot visits per hour

non_burglar@lemmy.world · 2 months ago

So you know that this is the lesser of the two evils? Seems like you’re viewing it from client’s perspective only.

No one wants to burden clients with Anubis, and Anubis shouldn’t exist. We are all (server operators and users) stuck with this solution for now because there is nothing else at the moment that keeps these scrapers at bay.

Even the author of Anubis doesn’t like the way it works. We all know it’s just more wasted computing for no reason except big tech doesn’t give a care about anyone.

ryannathans@aussie.zone · 2 months ago

My point is, and the author’s point is, it’s not computation that’s keeping the bots away right now. It’s the obscurity and challenge itself getting in the way.

billwashere@lemmy.world · 2 months ago

I don’t think so. I think he’s blaming the “solution” as being a stop gap at best and painful for end-users at worst. Yes the AI crawlers have caused the issue but I’m not sure this is a great final solution.

As the article discussed, this is essentially “an expensive“ math problem meant to deter AI crawlers but in the end it ain’t really that expensive. It’s more like they put two door handles on a door hoping the bots are too lazy to turn both of them but also severely slowing down all one-handed people. I’m not sure it will ever be feasible to essentially figure out how to have one bot determine if the other end is also a bot without human interaction.

ryannathans@aussie.zone · edit-2 2 months ago

It works because it’s a bit of obscurity, not because it’s expensive. Once it’s a big enough problem the scrapers will adapt and then the only option is to make it more obscure/different or crank up the difficulty which will slow down genuine users much more

mfed1122@discuss.tchncs.de · edit-2 2 months ago

Yeah, well-written stuff. I think Anubis will come and go. This beautifully demonstrates and, best of all, quantifies the ~~negligence~~ negligible cost to scrapers of Anubis.

It’s very interesting to try to think of what would work, even conceptually. Some sort of purely client-side captcha type of thing perhaps. I keep thinking about it in half-assed ways for minutes at a time.

Maybe something that scrambles the characters of the site according to some random “offset” of some sort, e.g maybe randomly selecting a modulus size and an offset to cycle them, or even just a good ol’ cipher. And the “captcha” consists of a slider that adjusts the offset. You as the viewer know it’s solved when the text becomes something sensical - so there’s no need for the client code to store a readable key that could be used to auto-undo the scrambling. You could maybe even have some values of the slider randomly chosen to produce English text if the scrapers got smart enough to check for legibility (not sure how to hide which slider positions would be these red herring ones though) - which could maybe be enough to trick the scraper into picking up junk text sometimes.

drkt@scribe.disroot.org · 2 months ago

That type of captcha already exists. I don’t know about their specific implementation, but 4chan has it, and it is trivially bypassed by userscripts.

Jade@programming.dev · 2 months ago

That kind of captcha is trivial to bypass via frequency analysis. Text that looks like language, as opposed to random noise, is very statistically recognisable.

Possibly linux@lemmy.zip · 2 months ago

Not to mention it relies on security though obscurity

It wouldn’t be that hard to figure out and bypass

Possibly linux@lemmy.zip · 2 months ago

Anubis is more of a economic solution. It doesn’t stop bots but it does make companies pay more to access content instead of having server operators foot the bill.

dabe@lemmy.zip · 2 months ago

I’m sure you meant to sound more analytical than anything… but this really comes off as arrogant.

You make the claim that Anubis is negligent and come and go, and then admit ton only spending minutes at a time thinking of solutions yourself, which you then just sorta spout. It’s fun to think about solutions to this problem collectively, but can you honestly believe that Anubis is negligent when it’s so clearly working and when the author has been so extremely clear about their own perception of its pitfalls and hasty development (go read their blog, it’s a fun time).

mfed1122@discuss.tchncs.de · edit-2 2 months ago

By negligence, I meant that the cost is negligible to the companies running scrapers, not that the solution itself is negligent. I should have said “negligibility” of Anubis, sorry - that was poor clarity on my part.

But I do think that the cost of it is indeed negligible, as the article shows. It doesn’t really matter if the author is biased or not, their analysis of the costs seems reasonable. I would need a counter-argument against that to think they were wrong. Just because they’re biased isn’t enough to discount the quantification they attempted to bring to the debate.

Also, I don’t think there’s any hypocrisy in me saying I’ve only thought about other solutions here and there - I’m not maintaining an anti-scraping library. And there’s already been indications that scrapers are just accepting the cost of Anubis on Codeberg, right? So I’m not trying to say I’m some sort of tech genius who has the right idea here, but from what Codeberg was saying, and from the numbers in this article, it sure looks like Anubis isn’t the right idea. I am indeed only having fun with my suggestions, not making whole libraries out of them and pronouncing them to be solutions. I personally haven’t seen evidence that Anubis is so clearly working? As the author points out, it seems like it’s only working right now because of how new it is, but if scrapers want to go through it, they easily can - which puts us in a sort of virus/antibiotic eternal war of attrition. And if course that is the case with many things in computing as well. So I guess my open wondering are just about if there’s ever any way to develop a countermeasure that the scrapers won’t find “worth it” to force through?

Edit for tone clarity: I’m don’t want to be antagonistic, rude, or hurtful in any way. Just trying to have a discussion and understand this situation. Perhaps I was arrogant, if so I apologize. It was also not my intent, fwiw. Also, thanks for helping me understand why I was getting downvoted. I intended my post to just be constructive spitballing about what I see as the eventual inevitable weakness in Anubis. I think it’s a great project and it’s great that people are getting use out of it even temporarily, and of course the devs deserve lots of respect for making the thing. But as much as I wish I could like it and believe it will solve the problem, I still don’t think it will.

dabe@lemmy.zip · 2 months ago

Well I can agree on the fact that the arms race situation we’re in sucks. It’s an old problem, seen in malware attacks and defenses. I’m just glad we have people fighting on our side in their spare time :’)

And it’s all good on the tone, thank you for your clarifications

Guillaume Rossolini@infosec.exchange · 2 months ago

@mfed1122 @tofu any client-side tech to avoid (some of the) bots is bound to, as its popularity grows, be either circumvented by the bot’s developers or the model behind the bot will have picked up enough to solve it

I don’t see how any of these are going to do better than a short term patch

rtxn@lemmy.world · edit-2 2 months ago

That’s the great thing about Anubis: it’s not client-side. Not entirely anyways. Similar to public key encryption schemes, it exploits the computational complexity of certain functions to solve the challenge. It can’t just say “solved, let me through” because the client has to calculate a number, based on the parameters of the challenge, that fits certain mathematical criteria, and then present it to the server. That’s the “proof of work” component.

A challenge could be something like “find the two prime factors of the semiprime 1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139”. This number is known as RSA-100, it was first factorized in 1991, which took several days of CPU time, but checking the result is trivial since it’s just integer multiplication. A similar semiprime of 260 decimal digits still hasn’t been factorized to this day. You can’t get around mathematics, no matter how advanced your AI model is.

Guillaume Rossolini@infosec.exchange · 2 months ago

@rtxn I don’t understand how that isn’t client side?

Anything that is client side can be, if not spoofed, then at least delegated to a sub process, and my argument stands

Passerby6497@lemmy.world · 2 months ago

Please, explain to us how you expect to spoof a math problem that you have to provide an answer to the server before proceeding.

You can math all you want on the client, but the server isn’t going to give you shit until you provide the right answer.

Guillaume Rossolini@infosec.exchange · 2 months ago

@Passerby6497 I really don’t understand the issue here

If there is a challenge to solve, then the server has provided that to the client

There is no way around this, is there?

Passerby6497@lemmy.world · 2 months ago

You’re given the challenge to solve by the server, yes. But just because the challenge is provided to you, that doesn’t mean you can fake your way through it.

You still have to calculate the answer before you can get any farther. You can’t bullshit/spoof your way through the math problem to bypass it, because your correct answer is required to proceed.

There is no way around this, is there?

Unless the server gives you a well-known problem you have the answer to/is easily calculated, or you find a vulnerability in something like Anubis to make it accept a wrong answer, not really. You’re stuck at the interstitial page with a math prompt until you solve it.

Unless I’m misunderstanding your position, I’m not sure what the disconnect is. The original question was about spoofing the challenge client side, but you can’t really spoof the answer to a complicated math problem unless there’s an issue with the server side validation.

Guillaume Rossolini@infosec.exchange · 2 months ago

@Passerby6497 my stance is that the LLM might recognize that the best way to solve the problem is to run chromium and get the answer from there, then pass it on?

rtxn@lemmy.world · edit-2 2 months ago

It’s not client-side because validation happens on the server side. The content won’t be displayed until and unless the server receives a valid response, and the challenge is formulated in such a way that calculating a valid answer will always take a long time. It can’t be spoofed because the server will know that the answer is bullshit. In my example, the server will know that the prime factors returned by the client are wrong because their product won’t be equal to the original semiprime. Delegating to a sub-process won’t work either, because what’s the parent process supposed to do? Move on to another piece of content that is also protected by Anubis?

The point is to waste the client’s time and thus reduce the number of requests the server has to handle, not to prevent scraping altogether.

Guillaume Rossolini@infosec.exchange · 2 months ago

@rtxn validation of what?

This is a typical network thing: client asks for resource, server says here’s a challenge, client responds or doesn’t, has the correct response or not, but has the challenge regardless

rtxn@lemmy.world · 2 months ago

THEN (and this is the part you don’t seem to understand) the client process has to waste time solving the challenge, which is, by the way, orders of magnitudes lighter on the server than serving the actual meaningful content, or cancel the request. If a new request is sent during that time, it will still have to waste time solving the challenge. The scraper will get through eventually, but the challenge delays the response and reduces the load on the server because while the scrapers are busy computing, it doesn’t have to serve meaningful content to them.

Guillaume Rossolini@infosec.exchange · 2 months ago

@rtxn all right, that’s all you had to say initially, rather than try convincing me that the network client was out of the loop: it isn’t, that’s the whole point of Anubis

mfed1122@discuss.tchncs.de · 2 months ago

Yeah, you’re absolutely right and I agree. So then do we have to resign the situation to being an eternal back-and-forth of just developing random new challenges every time the scrapers adapt to them? Like antibiotics for viruses? Maybe that is the way it is. And honestly that’s what I suspect. But Anubis feels so clever and so close to something that would work. The concept of making it about a cost that adds up, so that it intrinsically only effects massive processes significantly, is really smart…since it’s not about coming up with a challenge a computer can’t complete, but just a challenge that makes it economically not worth it to complete. But it’s disappointing to see that, at least with the current wait times, it doesn’t seem like it will cost enough to dissuade scrapers. And worse, the cost is so low that it seems like making the cost significant to the scrapers will require really insufferable wait times for users.

Guillaume Rossolini@infosec.exchange · 2 months ago

@mfed1122 yeah that is my worry, what’s an acceptable wait time for users? A tenth of a second is usually not noticeable to a human, but is it useful in this context? What about half a second, etc

I don’t know that I want a web where everything is artificially slowed by a full second for each document