Some thoughts on how useful Anubis really is. Combined with comments I read elsewhere about scrapers starting to solve the challenges, I’m afraid Anubis will be outdated soon and we need something else.

  • Klear@quokk.au
    link
    fedilink
    English
    arrow-up
    23
    arrow-down
    2
    ·
    edit-2
    1 day ago

    If that sounds familiar, it’s because it’s similar to how bitcoin mining works. Anubis is not literally mining cryptocurrency, but it is similar in concept to other projects that do exactly that

    Did the author only now discover cryptography? It’s like a cryptocurrency, just without currency, what a concept!

    • SkaveRat@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      6
      ·
      21 hours ago

      It’s a perfectly valid way to explain it, though

      If you try to show up with “cryptography” as an explanation, people will think of encrypting messages, not proof of work

      “Cryptocurrency with the currency” really is the perfect single sentence explanation

  • Dremor@lemmy.world
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    2
    ·
    2 days ago

    Anubis is no challenge like a captcha. Anubis is a ressource waster, forcing crawler to resolve a crypto challenge (basically like mining bitcoin) before being allowed in. That how it defends so well against bots, as they do not want to waste their resources on needless computing, they just cancel the page loading before it even happen, and go crawl elsewhere.

    • tofu@lemmy.nocturnal.gardenOP
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      4
      ·
      2 days ago

      No, it works because the scraper bots don’t have it implemented yet. Of course the companies would rather not spend additional compute resources, but their pockets are deep and some already adapted and solve the challenges.

      • Encrypt-Keeper@lemmy.world
        link
        fedilink
        English
        arrow-up
        10
        ·
        edit-2
        24 hours ago

        The point was never that Anubis challenges are something scrapers can’t get past. The point is it’s expensive to do so.

        Some bots don’t use JavaScript and can’t solve the challenges and so they’d be blocked, but there was never any point in time where no scrapes could solve them.

        • JuxtaposedJaguar@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          3
          ·
          24 hours ago

          Wait, so browsers that disable JavaScript won’t be able to access those websites? Then I hate it.

          Not everyone wants unauthenticated RCE from thousands of servers around the world.

          • Encrypt-Keeper@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            ·
            23 hours ago

            Not everyone wants unauthenticated RCE from thousands of servers around the world.

            Ive got really bad news for you my friend

      • Dremor@lemmy.world
        link
        fedilink
        English
        arrow-up
        13
        ·
        2 days ago

        To solve it or not do not change that they have to use more resources for crawling, which is the objective here. And by contrast, the website sees a lot less load compared to before the use of Anubis. In any case, I see it as a win.

        But despite that, it has its detractors, like any solution that becomes popular.

        But let’s be honest, what are the arguments against it?
        It takes a bit longer to access for the first time? Sure, but that’s not like you have to click anything or write anything.
        It executes foreign code on your machine? Literally 90% of the web does these days. Just disable JavaScript to see how many website is still functional. I’d be surprised if even a handful does.

        The only people having any advantages at not having Anubis are web crawler, be it ai bots, indexing bots, or script kiddies trying to find a vulnerable target.

        • int32@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 day ago

          I use uMatrix, which blocks js by default, so it is a bit inconvenient to have to enable js for some sites. websites which didn’t need it before, which is often the reason I use them, now require javascript.

        • tofu@lemmy.nocturnal.gardenOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 days ago

          Sure, I’m not arguing against Anubis! I just don’t think the added compute cost is sufficient to keep them out once they adjust.

          • rumba@lemmy.zip
            link
            fedilink
            English
            arrow-up
            1
            ·
            20 hours ago

            Conceptually, you could just really twist the knobs up. A human can wait to read a page for 15 seconds. But you’re trying to scrape 100,000 pages and they each take 15 seconds… You can make it expensive in both power and time that’s a win.

        • daniskarma@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          6
          arrow-down
          6
          ·
          edit-2
          1 day ago

          I’m against it for several reasons. Running unauthorized heavy duty code on your end. It’s not JS in order to make your site functional, it’s heavy calculations unprompted. If they would add simple button “click to run challenge” would at least be more polite and less “malware-like”.

          For some old devices the challenge last over 30 seconds, I can type a captcha in less time than that.

          It blocks behind the necessity to use a browser several webs that people (like the article author) tend to browse directly from a terminal.

          It’s a delusion. As shown by the article author solving the PoW challenge is not that much of an added cost. Span reduction would be the same with any other novel method, crawlers are just not prepared for it. Any prepared crawler would have no issues whatsoever. People are seeing results just because it’s obscurity, not because it really works as advertised. And in fact I believe some sites are starting to get crawled aggressively despite anubis as some crawlers are already catching up with this new Anubis trend.

          Take into account that the challenge needs to be light enough so a good user can enter the website in a few seconds running the challenge on a browser engine (very inefficient). A crawler interested in your site could easily put up a solution to mine the PoW using CUDA in a GPU which would be hundreds if not thousands of times more efficient. So the balance of difficulty (still browsable for users but costly to crawl) is not feasible.

          It’s not universally applicable. Imagine if all internet were behind PoW challenges. It would be like constant Bitcoin mining, a total waste of resources.

          The company behind Anubis seems more shady to me each day. They feed on anti-AI paranoia, they didn’t even answer the article author valid critics when he email them, they use clearly PR language aimed to convince and please certain demographics to place their product. They are full of slogans but lack substance. I just don’t trust them.

          • Dremor@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            ·
            edit-2
            1 day ago

            Fair point. I do agree with the “clic to execute challenge” approach.

            For the terminal browser, it has more to do with it not respecting web standard than Anubis not working on it.

            As for old hardware, I do agree that a temporization could be good idea, if it wasn’t so easy to circumvent. In such case bots would just wait in the background and resume once the timer is fullified, which would vastly decrease Anubis effectiveness as they don’t uses much power to do so. There isn’t really much that can be done here.

            As for the CUDA solution, that will depend on the implemented hash algorithm. Some of them (like the one used by Monero) are made to vastly more inefficient on GPU than it is on the CPU. Moreover, GPU servers are far more expensive to run than CPU ones, so the result would be the same : crawling would be more expensive.

            In any case, the best solution would be by far to make it a legal requirement to respect robot.txt, but for now the legislators prefer to look the other way.

  • rtxn@lemmy.world
    link
    fedilink
    English
    arrow-up
    35
    arrow-down
    2
    ·
    2 days ago

    New developments: just a few hours before I post this comment, The Register posted an article about AI crawler traffic. https://www.theregister.com/2025/08/21/ai_crawler_traffic/

    Anubis’ developer was interviewed and they posted the responses on their website: https://xeiaso.net/notes/2025/el-reg-responses/

    In particular:

    Fastly’s claims that 80% of bot traffic is now AI crawlers

    In some cases for open source projects, we’ve seen upwards of 95% of traffic being AI crawlers. For one, deploying Anubis almost instantly caused server load to crater by so much that it made them think they accidentally took their site offline. One of my customers had their power bills drop by a significant fraction after deploying Anubis. It’s nuts.

    So, yeah. If we believe Xe, OOP’s article is complete hogwash.

    • tofu@lemmy.nocturnal.gardenOP
      link
      fedilink
      English
      arrow-up
      9
      ·
      2 days ago

      Cool article, thanks for linking! Not sure about that being a new development though, it’s just results, but we already knew it’s working. The question is, what’s going to work once the scrapers adapt?

  • rtxn@lemmy.world
    link
    fedilink
    English
    arrow-up
    204
    arrow-down
    1
    ·
    edit-2
    3 days ago

    The current version of Anubis was made as a quick “good enough” solution to an emergency. The article is very enthusiastic about explaining why it shouldn’t work, but completely glosses over the fact that it has worked, at least to an extent where deploying it and maybe inconveniencing some users is preferable to having the entire web server choked out by a flood of indiscriminate scraper requests.

    The purpose is to reduce the flood to a manageable level, not to block every single scraper request.

    • 0_o7@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      18
      arrow-down
      1
      ·
      2 days ago

      The article is very enthusiastic about explaining why it shouldn’t work, but completely glosses over the fact that it has worked

      This post was originally written for ycombinator “Hacker” News which is vehemently against people hacking things together for greater good, and more importantly for free.

      It’s more of a corporate PR release site and if you aren’t known by the “community”, calling out solutions they can’t profit off of brings all the tech-bros to the yard for engagement.

    • poVoq@slrpnk.net
      link
      fedilink
      English
      arrow-up
      95
      arrow-down
      2
      ·
      edit-2
      3 days ago

      And it was/is for sure the lesser evil compared to what most others did: put the site behind Cloudflare.

      I feel people that complain about Anubis have never had their server overheat and shut down on an almost daily basis because of AI scrapers 🤦

      • moseschrute@crust.piefed.social
        link
        fedilink
        English
        arrow-up
        1
        ·
        24 hours ago

        Out of curiosity, what’s the issue with Cloudflair? Aside from the constant worry they may strong arm you into their enterprise pricing if you’re site is too popular lol. I understand support open source, but why not let companies handle the expensive bits as long as they’re willing?

        I guess I can answer my own question. If the point of the Fediverse is to remove a single point of failure, then I suppose Cloidflare could become a single point to take down the network. Still, we could always pivot away from those types of services later, right?

        • Limonene@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          2
          ·
          22 hours ago

          Cloudflare has IP banned me before for no reason (no proxy, no VPN, residential ISP with no bot traffic). They’ve switched their captcha system a few times, and some years it’s easy, some years it’s impossible.

      • tofu@lemmy.nocturnal.gardenOP
        link
        fedilink
        English
        arrow-up
        19
        arrow-down
        1
        ·
        edit-2
        2 days ago

        Yeah, I’m just wondering what’s going to follow. I just hope everything isn’t going to need to go behind an authwall.

      • mobotsar@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        5
        ·
        2 days ago

        Is there a reason other than avoiding infrastructure centralization not to put a web server behind cloudflare?

        • poVoq@slrpnk.net
          link
          fedilink
          English
          arrow-up
          23
          arrow-down
          1
          ·
          2 days ago

          Yes, because Cloudflare routinely blocks entire IP ranges and puts people into endless captcha loops. And it snoops on all traffic and collects a lot of metadata about all your site visitors. And if you let them terminate TLS they will even analyse the passwords that people use to log into the services you run. It’s basically a huge survelliance dragnet and probably a front for the NSA.

        • Björn Tantau@swg-empire.de
          link
          fedilink
          English
          arrow-up
          11
          arrow-down
          1
          ·
          2 days ago

          Cloudflare would need https keys so they could read all the content you worked so hard to encrypt. If I wanted to do bad shit I would apply at Cloudflare.

          • mobotsar@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            7
            ·
            edit-2
            2 days ago

            Maybe I’m misunderstanding what “behind cloudflare” means in this context, but I have a couple of my sites proxied through cloudflare, and they definitely don’t have my keys.

            I wouldn’t think using a cloudflare captcha would require such a thing either.

            • StarkZarn@infosec.pub
              link
              fedilink
              English
              arrow-up
              14
              ·
              2 days ago

              That’s because they just terminate TLS at their end. Your DNS record is “poisoned” by the orange cloud and their infrastructure answers for you. They happen to have a trusted root CA so they just present one of their own certificates with a SAN that matches your domain and your browser trusts it. Bingo, TLS termination at CF servers. They have it in cleartext then and just re-encrypt it with your origin server if you enforce TLS, but at that point it’s meaningless.

      • daniskarma@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        5
        ·
        2 days ago

        I still think captchas are a better solution.

        In order to surpass them they have to run AI inference which is also comes with compute costs. But for legitimate users you don’t run unauthorized intensive tasks on their hardware.

        • poVoq@slrpnk.net
          link
          fedilink
          English
          arrow-up
          9
          ·
          2 days ago

          They are much worse for accessibility, and also take longer to solve and are more distruptive for the majority of users.

          • daniskarma@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            5
            ·
            edit-2
            2 days ago

            Anubis is worse for privacy. As you have to have JavaScript enabled. And worse for the environment as the cryptographic challenges with PoW are just a waste.

            Also reCaptcha types are not really that disturbing most of the time.

            As I said, the polite thing you just be giving users the options. Anubis PoW running directly just for entering a website is one of the most rudest piece of software I’ve seen lately. They should be more polite, and just give an option to the user, maybe the user could chose to solve a captcha or run Anubis PoW, or even just having Anubis but after a button the user could click.

            I don’t think is good practice to run that type of software just for entering a website. If that tendency were to grow browsers would need to adapt and straight up block that behavior. Like only allow access to some client resources after an user action.

            • poVoq@slrpnk.net
              link
              fedilink
              English
              arrow-up
              11
              ·
              edit-2
              2 days ago

              Are you seriously complaining about an (entirely false) negative privacy aspect of Anubis and then suggest reCaptcha from Google is better?

              Look, no one thinks Anubis is great, but often it is that or the website becoming entirely inaccessible because it is DDOSed to death by the AI scrapers.

              • daniskarma@lemmy.dbzer0.com
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                3
                ·
                2 days ago

                First, I said reCaptcha types, meaning captchas of the style of reCaptcha. That could be implemented outside a google environment. Secondly, I never said that types were better for privacy. I just said Anubis is bad for privacy. Traditional captchas that work without JavaScript would be the privacy friendly way.

                Third, it’s not a false proposition. Disabling JavaScript can protect your privacy a great deal. A lot of tracking is done through JavaScript.

                Last, that’s just the Anubis PR slogan. Not the truth, as I said ddos mitigation could be implemented in other ways. More polite and/or environmental friendly.

                Are you astrosurfing for anubis? Because I really cannot understand why something as simple as a landing page with a button “run PoW challenge” would be that bad

                • poVoq@slrpnk.net
                  link
                  fedilink
                  English
                  arrow-up
                  4
                  ·
                  2 days ago

                  Anubis is not bad for privacy, but rather the opposite. Server admins explicitly chose it over commonly available alternatives to preserve the privacy of their visitors.

                  If you don’t like random Javascript execution, just install an allow-list extension in your browser 🤷

                  And no, it is not a PR slogan, it is the live experience of thousands of server admins (me included) that have been fighting with this for month now and are very grateful that Anubis has provided some (likely only temporary) relief from that.

                  And I don’t get what the point of an extra button would be when the result is exactly the same 🤷

          • interdimensionalmeme@lemmy.ml
            link
            fedilink
            English
            arrow-up
            6
            arrow-down
            8
            ·
            2 days ago

            What CPU do you have made after 2004 that doesn’t have automatic temperature control ?
            I don’t think there is any, unless you somehow managed to disable it ?
            Even a raspberry pi without a heatsink won’t overheat to shutdown

            • poVoq@slrpnk.net
              link
              fedilink
              English
              arrow-up
              10
              arrow-down
              1
              ·
              2 days ago

              You are right, it is actually worse, it usually just overloads the CPU so badly that it starts to throttle and then I can’t even access the server via SSH anymore. But sometimes it also crashes the server so that it reboots, and yes that can happen on modern CPUs as well.

              • interdimensionalmeme@lemmy.ml
                link
                fedilink
                English
                arrow-up
                4
                arrow-down
                8
                ·
                2 days ago

                You need to set you http serving process to a priority below the administrative processes (in the place where you are starting it, so assuming linux server that would be your init script or systemd service unit).

                Actual crash causing reboot ? Do you have faulty ram maybe ? That’s really not ever supposed to happen from anything happenning in userland. That’s not AI, your stuff might be straight up broken.

                Only thing that isn’t broken that could reboot a server is a watchdog timer.

                You server shouldn’t crash, reboot or become unreachable from the admin interface even at 100% load and it shouldn’t overheat either, temperatures should never exceed 80C no matter what you do, it’s supposed to be impossible with thermal management, which all processors have had for decades.

                • poVoq@slrpnk.net
                  link
                  fedilink
                  English
                  arrow-up
                  3
                  arrow-down
                  1
                  ·
                  2 days ago

                  Great that this is all theoretical 🤷 My server hardware might not be the newest but it is definitly not broken.

                  And besides, what good is that you can still barely access the server through ssh, when the cpu is constantly maxed out and site visitors only get a timeout when trying to access the services?

                  I don’t even get what you are trying to argue here. That the AI scraper DDOS isn’t so bad because in theory it shouldn’t crash the server? Are you even reading what you are writing yourself? 🤡

    • AnUnusualRelic@lemmy.world
      link
      fedilink
      English
      arrow-up
      23
      arrow-down
      1
      ·
      2 days ago

      The problem is that the purpose of Anubis was to make crawling more computationally expensive and that crawlers are apparently increasingly prepared to accept that additional cost. One option would be to pile some required cycles on top of what’s currently asked, but it’s a balancing act before it starts to really be an annoyance for the meat popsicle users.

    • loudwhisper@infosec.pub
      link
      fedilink
      English
      arrow-up
      3
      ·
      2 days ago

      Exactly my thoughts too. Lots of theory about why it won’t work, but not looking at the fact that if people use it, maybe it does work, and when it won’t work, they will stop using it.

  • unexposedhazard@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    70
    arrow-down
    8
    ·
    2 days ago

    This… makes no sense to me. Almost by definition, an AI vendor will have a datacenter full of compute capacity.

    Well it doesnt fucking matter what “makes sense to you” because it is working…
    Its being deployed by people who had their sites DDoS’d to shit by crawlers and they are very happy with the results so what even is the point of trying to argue here?

    • daniskarma@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      12
      ·
      edit-2
      2 days ago

      It’s working because it’s not very used. It’s sort of a “pirate seagull” theory. As long a few people use it it works. Because scrappers don’t really count on Anubis so they don’t implement systems to surpass it.

      If it were to become more common it would be really easy to implement systems that would defeat the purpose.

      As of right now sites are ok because scrappers just send https requests and expect a full response. If someone wants to bypass Anubis protection they would need to take into account that they will receive a cryptographic challenge and have to solve it.

      The thing is that cryptographic challenges can be very optimized. They are designed to run in a very inefficient environment as it is a browser. But if someone would take the challenge and solve it in a better environment using CUDA or something like that it would take a fraction of the energy defeating the purpose of “being so costly that it’s not worth scrapping”.

      At this point it’s only a matter of time that we start seeing scrappers like that. Specially if more and more sites start using Anubis.

  • CrackedLinuxISO@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    2
    ·
    edit-2
    2 days ago

    There are some sites where Anubis won’t let me through. Like, I just get immediately bounced.

    So RIP dwarf fortress forums. I liked you.

      • SL3wvmnas@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        5
        ·
        2 days ago

        I, too get blocked by certain sites. I think it’s a configuration thing, where it does not like my combination of uBlock/NoScript, even when I explicitly allow their scripts…

  • daniskarma@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    9
    ·
    2 days ago

    Sometimes I think. Imagine if a company like google or facebook would implement something like anubis. And suddenly most people’s browsers would start solving cpu intensive constant cryptographic challenges. People would be outraged by the wasted energy. But somehow “cool small company” does it and it’s fine.

    I do not think anubis system is sustainable for all the people to use it, it’s just too wasteful energy wise.

      • daniskarma@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        16
        ·
        edit-2
        2 days ago

        Captcha.

        It does all Anubis does. If a scrapper wants to solve it automatically it’s computer intensive, they have to run AI inference, but for the user it’s just a little time consuming.

        With captchas you don’t run aggressive software unauthorized on anyone’s computer.

        Solution did exist. But Anubis is “trendy” and they are masters in PR within some specific circles of people who always wants the lastest most trendiest thing.

        But good old captcha would achieve the same result as Anubis, in a more sustainable way.

        Or at least give user an option of running or not running the challenge and leave the page. And make clear for the user that their hardware is going to run an intensive task. It really feels very aggressive to have a webpage to run basically a cryptominer unauthorized in your computer. And for me having a cargirl as a mascot does not forgive the rudeness of it.

        • tofu@lemmy.nocturnal.gardenOP
          link
          fedilink
          English
          arrow-up
          15
          ·
          2 days ago

          “good old captcha” is the most annoying thing ever for people and basically universally hated. Talking about leaving the page, what do you think what will cause more people to leave the page, a captcha that’s often broken or something where people don’t have to do anything but wait a little?

          • daniskarma@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            5
            ·
            2 days ago

            They don’t have to do anything but let an unknown program to max their cpu unauthorized.

            Imagine if google would implement that. Billions of computers running PoW constantly, what could go wrong?

            • tofu@lemmy.nocturnal.gardenOP
              link
              fedilink
              English
              arrow-up
              1
              ·
              2 days ago

              They don’t have to do anything but let an unknown program to max their cpu unauthorized.

              But they currently can’t and that’s the point.

  • mfed1122@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    24
    ·
    edit-2
    1 day ago

    Yeah, well-written stuff. I think Anubis will come and go. This beautifully demonstrates and, best of all, quantifies the negligence negligible cost to scrapers of Anubis.

    It’s very interesting to try to think of what would work, even conceptually. Some sort of purely client-side captcha type of thing perhaps. I keep thinking about it in half-assed ways for minutes at a time.

    Maybe something that scrambles the characters of the site according to some random “offset” of some sort, e.g maybe randomly selecting a modulus size and an offset to cycle them, or even just a good ol’ cipher. And the “captcha” consists of a slider that adjusts the offset. You as the viewer know it’s solved when the text becomes something sensical - so there’s no need for the client code to store a readable key that could be used to auto-undo the scrambling. You could maybe even have some values of the slider randomly chosen to produce English text if the scrapers got smart enough to check for legibility (not sure how to hide which slider positions would be these red herring ones though) - which could maybe be enough to trick the scraper into picking up junk text sometimes.

    • Possibly linux@lemmy.zip
      link
      fedilink
      English
      arrow-up
      6
      ·
      2 days ago

      Anubis is more of a economic solution. It doesn’t stop bots but it does make companies pay more to access content instead of having server operators foot the bill.

    • dabe@lemmy.zip
      link
      fedilink
      English
      arrow-up
      6
      ·
      2 days ago

      I’m sure you meant to sound more analytical than anything… but this really comes off as arrogant.

      You make the claim that Anubis is negligent and come and go, and then admit ton only spending minutes at a time thinking of solutions yourself, which you then just sorta spout. It’s fun to think about solutions to this problem collectively, but can you honestly believe that Anubis is negligent when it’s so clearly working and when the author has been so extremely clear about their own perception of its pitfalls and hasty development (go read their blog, it’s a fun time).

      • mfed1122@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        1 day ago

        By negligence, I meant that the cost is negligible to the companies running scrapers, not that the solution itself is negligent. I should have said “negligibility” of Anubis, sorry - that was poor clarity on my part.

        But I do think that the cost of it is indeed negligible, as the article shows. It doesn’t really matter if the author is biased or not, their analysis of the costs seems reasonable. I would need a counter-argument against that to think they were wrong. Just because they’re biased isn’t enough to discount the quantification they attempted to bring to the debate.

        Also, I don’t think there’s any hypocrisy in me saying I’ve only thought about other solutions here and there - I’m not maintaining an anti-scraping library. And there’s already been indications that scrapers are just accepting the cost of Anubis on Codeberg, right? So I’m not trying to say I’m some sort of tech genius who has the right idea here, but from what Codeberg was saying, and from the numbers in this article, it sure looks like Anubis isn’t the right idea. I am indeed only having fun with my suggestions, not making whole libraries out of them and pronouncing them to be solutions. I personally haven’t seen evidence that Anubis is so clearly working? As the author points out, it seems like it’s only working right now because of how new it is, but if scrapers want to go through it, they easily can - which puts us in a sort of virus/antibiotic eternal war of attrition. And if course that is the case with many things in computing as well. So I guess my open wondering are just about if there’s ever any way to develop a countermeasure that the scrapers won’t find “worth it” to force through?

        Edit for tone clarity: I’m don’t want to be antagonistic, rude, or hurtful in any way. Just trying to have a discussion and understand this situation. Perhaps I was arrogant, if so I apologize. It was also not my intent, fwiw. Also, thanks for helping me understand why I was getting downvoted. I intended my post to just be constructive spitballing about what I see as the eventual inevitable weakness in Anubis. I think it’s a great project and it’s great that people are getting use out of it even temporarily, and of course the devs deserve lots of respect for making the thing. But as much as I wish I could like it and believe it will solve the problem, I still don’t think it will.

    • Jade@programming.dev
      link
      fedilink
      English
      arrow-up
      20
      ·
      2 days ago

      That kind of captcha is trivial to bypass via frequency analysis. Text that looks like language, as opposed to random noise, is very statistically recognisable.

      • Possibly linux@lemmy.zip
        link
        fedilink
        English
        arrow-up
        5
        ·
        2 days ago

        Not to mention it relies on security though obscurity

        It wouldn’t be that hard to figure out and bypass

    • drkt@scribe.disroot.org
      link
      fedilink
      English
      arrow-up
      20
      ·
      3 days ago

      That type of captcha already exists. I don’t know about their specific implementation, but 4chan has it, and it is trivially bypassed by userscripts.

    • Guillaume Rossolini@infosec.exchange
      link
      fedilink
      arrow-up
      4
      ·
      3 days ago

      @mfed1122 @tofu any client-side tech to avoid (some of the) bots is bound to, as its popularity grows, be either circumvented by the bot’s developers or the model behind the bot will have picked up enough to solve it

      I don’t see how any of these are going to do better than a short term patch

      • mfed1122@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 day ago

        Yeah, you’re absolutely right and I agree. So then do we have to resign the situation to being an eternal back-and-forth of just developing random new challenges every time the scrapers adapt to them? Like antibiotics for viruses? Maybe that is the way it is. And honestly that’s what I suspect. But Anubis feels so clever and so close to something that would work. The concept of making it about a cost that adds up, so that it intrinsically only effects massive processes significantly, is really smart…since it’s not about coming up with a challenge a computer can’t complete, but just a challenge that makes it economically not worth it to complete. But it’s disappointing to see that, at least with the current wait times, it doesn’t seem like it will cost enough to dissuade scrapers. And worse, the cost is so low that it seems like making the cost significant to the scrapers will require really insufferable wait times for users.

        • Guillaume Rossolini@infosec.exchange
          link
          fedilink
          arrow-up
          1
          ·
          17 hours ago

          @mfed1122 yeah that is my worry, what’s an acceptable wait time for users? A tenth of a second is usually not noticeable to a human, but is it useful in this context? What about half a second, etc

          I don’t know that I want a web where everything is artificially slowed by a full second for each document

      • rtxn@lemmy.world
        link
        fedilink
        English
        arrow-up
        9
        ·
        edit-2
        2 days ago

        That’s the great thing about Anubis: it’s not client-side. Not entirely anyways. Similar to public key encryption schemes, it exploits the computational complexity of certain functions to solve the challenge. It can’t just say “solved, let me through” because the client has to calculate a number, based on the parameters of the challenge, that fits certain mathematical criteria, and then present it to the server. That’s the “proof of work” component.

        A challenge could be something like “find the two prime factors of the semiprime 1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139”. This number is known as RSA-100, it was first factorized in 1991, which took several days of CPU time, but checking the result is trivial since it’s just integer multiplication. A similar semiprime of 260 decimal digits still hasn’t been factorized to this day. You can’t get around mathematics, no matter how advanced your AI model is.

        • Guillaume Rossolini@infosec.exchange
          link
          fedilink
          arrow-up
          1
          arrow-down
          9
          ·
          2 days ago

          @rtxn I don’t understand how that isn’t client side?

          Anything that is client side can be, if not spoofed, then at least delegated to a sub process, and my argument stands

          • Passerby6497@lemmy.world
            link
            fedilink
            English
            arrow-up
            9
            ·
            2 days ago

            Please, explain to us how you expect to spoof a math problem that you have to provide an answer to the server before proceeding.

            You can math all you want on the client, but the server isn’t going to give you shit until you provide the right answer.

              • Passerby6497@lemmy.world
                link
                fedilink
                English
                arrow-up
                6
                ·
                2 days ago

                You’re given the challenge to solve by the server, yes. But just because the challenge is provided to you, that doesn’t mean you can fake your way through it.

                You still have to calculate the answer before you can get any farther. You can’t bullshit/spoof your way through the math problem to bypass it, because your correct answer is required to proceed.

                There is no way around this, is there?

                Unless the server gives you a well-known problem you have the answer to/is easily calculated, or you find a vulnerability in something like Anubis to make it accept a wrong answer, not really. You’re stuck at the interstitial page with a math prompt until you solve it.

                Unless I’m misunderstanding your position, I’m not sure what the disconnect is. The original question was about spoofing the challenge client side, but you can’t really spoof the answer to a complicated math problem unless there’s an issue with the server side validation.

          • rtxn@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            ·
            edit-2
            2 days ago

            It’s not client-side because validation happens on the server side. The content won’t be displayed until and unless the server receives a valid response, and the challenge is formulated in such a way that calculating a valid answer will always take a long time. It can’t be spoofed because the server will know that the answer is bullshit. In my example, the server will know that the prime factors returned by the client are wrong because their product won’t be equal to the original semiprime. Delegating to a sub-process won’t work either, because what’s the parent process supposed to do? Move on to another piece of content that is also protected by Anubis?

            The point is to waste the client’s time and thus reduce the number of requests the server has to handle, not to prevent scraping altogether.

            • Guillaume Rossolini@infosec.exchange
              link
              fedilink
              arrow-up
              1
              arrow-down
              3
              ·
              2 days ago

              @rtxn validation of what?

              This is a typical network thing: client asks for resource, server says here’s a challenge, client responds or doesn’t, has the correct response or not, but has the challenge regardless

              • rtxn@lemmy.world
                link
                fedilink
                English
                arrow-up
                4
                ·
                2 days ago

                THEN (and this is the part you don’t seem to understand) the client process has to waste time solving the challenge, which is, by the way, orders of magnitudes lighter on the server than serving the actual meaningful content, or cancel the request. If a new request is sent during that time, it will still have to waste time solving the challenge. The scraper will get through eventually, but the challenge delays the response and reduces the load on the server because while the scrapers are busy computing, it doesn’t have to serve meaningful content to them.

                • Guillaume Rossolini@infosec.exchange
                  link
                  fedilink
                  arrow-up
                  1
                  arrow-down
                  6
                  ·
                  2 days ago

                  @rtxn all right, that’s all you had to say initially, rather than try convincing me that the network client was out of the loop: it isn’t, that’s the whole point of Anubis

  • ryannathans@aussie.zone
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    21
    ·
    3 days ago

    Yeah has seemed like a bit of a waste of time, once that difficulty gets scaled up and expiration down it’s gonna get annoying to use the web on phones

    • non_burglar@lemmy.world
      link
      fedilink
      English
      arrow-up
      28
      arrow-down
      3
      ·
      2 days ago

      I had to get my glasses to re-read this comment.

      You know why anubis is in place on so many sites, right? You are literally blaming the victims for the absolute bullshit AI is foisting on us all.

      • ryannathans@aussie.zone
        link
        fedilink
        English
        arrow-up
        3
        ·
        2 days ago

        Yes, I manage cloudflare for a massive site that at times gets hit with millions of unique bot visits per hour

        • non_burglar@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          ·
          2 days ago

          So you know that this is the lesser of the two evils? Seems like you’re viewing it from client’s perspective only.

          No one wants to burden clients with Anubis, and Anubis shouldn’t exist. We are all (server operators and users) stuck with this solution for now because there is nothing else at the moment that keeps these scrapers at bay.

          Even the author of Anubis doesn’t like the way it works. We all know it’s just more wasted computing for no reason except big tech doesn’t give a care about anyone.

          • ryannathans@aussie.zone
            link
            fedilink
            English
            arrow-up
            4
            ·
            2 days ago

            My point is, and the author’s point is, it’s not computation that’s keeping the bots away right now. It’s the obscurity and challenge itself getting in the way.

      • billwashere@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        8
        ·
        2 days ago

        I don’t think so. I think he’s blaming the “solution” as being a stop gap at best and painful for end-users at worst. Yes the AI crawlers have caused the issue but I’m not sure this is a great final solution.

        As the article discussed, this is essentially “an expensive“ math problem meant to deter AI crawlers but in the end it ain’t really that expensive. It’s more like they put two door handles on a door hoping the bots are too lazy to turn both of them but also severely slowing down all one-handed people. I’m not sure it will ever be feasible to essentially figure out how to have one bot determine if the other end is also a bot without human interaction.

        • ryannathans@aussie.zone
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          2 days ago

          It works because it’s a bit of obscurity, not because it’s expensive. Once it’s a big enough problem the scrapers will adapt and then the only option is to make it more obscure/different or crank up the difficulty which will slow down genuine users much more