TLDR

  • Anubis inconveniences legitimate users more than scrapers
  • It DOES NOT BLOCK SCRAPERS in the default configuration
  • Its challenges are TRIVIAL to solve by dedicated scrapers
  • It does NOT affect scraper throughput (remember: latency != throughput)
  • It does NOT implement rate-limits

What is Anubis

Anubis is a self-hosted alternative to existing server protection systems such as Cloudflare Turnstile.
Anubis is notable as it advertises itself to be specifically against AI companies and scrapers:

This program is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them.

At the time of writing, Anubis is in-use by a lot of notable projects, including but not limited to:

This seems like a pretty cool project, so what’s wrong with it?

The challenge doesn’t work

  • By nature, Anubis’s challenge will only actually stop headless no-JS scrapers
  • Due to how prevalent clientside rendering is on the web, all modern scrapers implement a JS runtime and many cannot be detected as “headless” (not that Anubis even tries)
  • Dedicated scrapers are far more aggressive and can directly bypass it

It doesn’t challenge scrapers

  • BY DESIGN Anubis by default ONLY affects a client if it has Mozilla in its user-agent
  • This means that by default, any scraper using Node’s fetch, Python’s requests, wget or curl, etc WILL NOT BE AFFECTED
  • In fact, it is actually FASTER to wget a site using Anubis than it is to wait for the challenge to complete
  • In practice, this means Anubis will deploy a challenge to legitimate users more than scrapers

It doesn’t slow down scrapers

  • In the best-case scenario, the Anubis does NOT outright block scrapers
  • It issues a challenge ONCE per-session
  • In practice this means that a standard scraper (if it’s even challenged) would need to wait around half a second to connect to a site, after that it can continue hammering the server with the same throughput as it would otherwise
  • INCREASING LATENCY DOES NOT DECREASE THROUGHPUT
  • Anubis’s “proof of work” is trivially easy to be solved for dedicated scrapers
  • It only takes a long time due to the inefficient Javascript implementation, a single GPU could complete a proof of work for every Anubis deployment in under a second: https://lock.cmpxchg8b.com/anubis.html#numbers
  • Good thing none of these “AI companies” have GPUs!
  • Anubis in-practice only significantly slows down legitimate users: https://github.com/TecharoHQ/anubis/discussions/985

The project’s politics are weird

  • Anubis claims to be against AI companies
  • Anubis contains AI-generated code
  • Anubis used and tried to hide? an AI-generated mascot (which has since been redrawn by a human)

Anubis makes it EASIER to scrape a website

  • Previously, websites using Cloudflare turnstile were harder to scrape due to Cloudflare’s advanced heuristics and behaviour analysis technology
  • Anubis has no such heuristics despite claiming it (regex on headers does NOT count) - seriously, check the codebase
  • This means in order to scrape the majority of websites using Anubis, all you need to do is the following:
import requests
response = requests.get("https://lore.kernel.org/", headers={ "user-agent": "curl" })
print(response.text)
  • And for websites which are configured to indiscriminately challenge clients (making the “heuristics” aspect USELESS):
from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get('https://anubis.techaro.lol/')
while "id=\"anubis_challenge\"" in driver.page_source:
    time.sleep(0.25)
print(driver.page_source)

# Anubis token cookie is now set so you can start doing your rapid scraping here since it doesn't implement a ratelimit
# You can even extract the cookie from the browser and feed it to your standard run-of-the-mill massively parallel scraper
  • Let that sink in, Anubis’s default configuration is so bad that the official site makes it challenge regardless of “heuristics”
  • The challenge is so bad, a headless browser with NO MODIFICATIONS, the very thing they are trying to block, can pass it within seconds

A brief comparison

AnubisCloudflareNginx + ngx_http_limit_req_module
Needs JS to passYes (Can be configured not to)YesNo
Blocks rapid requestsNoYesYes
Blocks large requestsNoYesYes
Blocks abnormal requestsNoYesNo
Blocks curl & wgetNo (can be configured to)Yes (can be configured not to)No (can be configured to)
Has a furry mascotYesNoNo
Can be trivially bypassedYesNoNo

Finally, it is worth noting that Cloudflare will block requests before they even reach your network and has succesfully shrugged off Tbps DDoSes, wheras Anubis by nature still needs the request to hit your server, so your overall bandwidth is impacted by it.

Let this be a lesson to the FOSS community as a whole, funny furry mascot DOES NOT equate to good software.