this post was submitted on 21 Mar 2025
283 points (99.6% liked)

Selfhosted

45170 readers
1017 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

I just started using this myself, seems pretty great so far!

Clearly doesn't stop all AI crawlers, but a significantly large chunk of them.

top 50 comments
sorted by: hot top controversial new old
[–] [email protected] 25 points 1 week ago (1 children)

Why Sha256? Literally every processor has a crypto accelerator and will easily pass. And datacenter servers have beefy server CPUs. This is only effective against no-JS scrapers.

[–] [email protected] 21 points 1 week ago* (last edited 1 week ago) (1 children)

It requires a bunch of browser features that non-user browsers don't have, and the proof-of-work part is like the least relevant piece in this that only gets invoked once a week or so to generate a unique cookie.

I sometimes have the feeling that as soon as some crypto-currency related features are mentioned people shut off part of their brain. Either because they hate crypto-currencies or because crypto-currency scammers have trained them to only look at some technical implementation details and fail to see the larger picture that they are being scammed.

[–] [email protected] 2 points 1 week ago (1 children)

So if you try to access a website using this technology via terminal, what happens? The connection fails?

[–] [email protected] 4 points 1 week ago (1 children)

If your browser doesn't have a Mozilla user agent (I.e. like chrome or Firefox) it will pass directly. Most AI crawlers use these user agents to pretend to be human users

[–] [email protected] 1 points 6 days ago (1 children)

What I'm thinking about is more that in Linux, it's common to access URLs directly from the terminal for various purposes, instead of using a browser.

[–] [email protected] 1 points 6 days ago

If you're talking about something like curl, that also uses its own User agent unless asked to impersonate some other UA. If not, then maybe I can't help.

[–] [email protected] 11 points 1 week ago (1 children)

Found the FF14 fan lol
The release names are hilarious

[–] [email protected] 9 points 1 week ago (1 children)

What's the ffxiv reference here?

Anubis is from Egyptian mythology.

[–] [email protected] 7 points 1 week ago

The names of release versions are famous FFXIV Garleans

[–] [email protected] 17 points 1 week ago (1 children)

I think the maze approach is better, this seems like it hurts valid users if the web more than a company would be.

[–] [email protected] 19 points 1 week ago* (last edited 1 week ago) (1 children)

For those not aware, nepenthes is an example for the above mentioned approach !

[–] [email protected] 7 points 1 week ago

This looks like it can can actually fuck up some models, but the unnecessary CPU load it will generate means most websites won't use it unfortunately

[–] [email protected] 17 points 1 week ago* (last edited 1 week ago) (3 children)

I did not find any instruction on the source page on how to actually deploy this. That would be a nice touch imho.

[–] [email protected] 3 points 1 week ago

The docker image page has it

[–] [email protected] 3 points 1 week ago

Or even a quick link to the relevant portion of the docs at least would be cool

[–] [email protected] 11 points 1 week ago

There are some detailed instructions on the docs site, tho I agree it'd be nice to have in the readme, too.

Sounds like the dev was not expecting this much interest for the project out of nowhere so there will def be gaps.

load more comments
view more: next ›