Have you ever heard of sparse files, and how Linux and Windows deal with zips of it? You'll love this.
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
I want to know he they built that visualization
I've been thinking about making an nginx plugin that randomizes words on a page to poison AI scrapers.
There are "AI mazes" that do that.
I remember reading and article about this but haven't found it yet
That is a very interesting git repo. Is this just a web view into the actual git folder?
If you have the time, I think it's a great idea.
This is why I use things like Docusaurus to generate static sites. Vulnerability injections are pretty hard when there's no code to inject into.
Probably only works for dumb bots and I'm guessing the big ones are resilient to this sort of thing.
Judging from recent stories the big threat is bots scraping for AIs and I wonder if there is a way to poison content so any AI ingesting it becomes dumber. e.g. text which is nonsensical or filled with counter information, trap phrases that reveal any AIs that ingested it, garbage pictures that purport to show something they don't etc.
I don't know as to poisoning AI, but one thing that I used to do was to redirect any suspicious bots or ones that were hitting their server too much to a simple html page with no JS or CSS or forward links. Then they used to go away.
When it comes to attacks on the Internet, doing simple things to get rid of the stupid bots means kicking 90% of attacks out. No, it won't work against a determined foe, but it does something useful.
Same goes for setting SSH to a random port. Logs are so much cleaner after doing that.
Setting a random SSH port and limiting it to 3/min saw failed login attempts fall by 99% and jailed IPs fall to 0.
There have been some attempts in that regard, I don’t remember the names of the projects, but there were one or two that’d basically generate a crapton of nonsense to do just that. No idea how well that works.
❤️
I'd be amazed if this works, since these sorts of tricks have been around since dinosaurs ruled the Earth, and most bots will use pretty modern zip libraries which will just return "nope" or throw an exception, which will be treated exactly the same way any corrupt file is - for example a site saying it's serving a zip file but the contents are a generic 404 html file, which is not uncommon.
Also, be careful because you could destroy your own device? What the hell? No. Unless you're using dd backwards and as root, you can't do anything bad, and even then it's the drive contents you overwrite, not the device you "destroy".
Yeah, this article came across as if written by a complete beginner. They mention having their WordPress hacked, but failed to admit it was because they didn't upgrade the install.
Sadly about the only thing that reliably helps against malicious crawlers is Anubis
Neat
That URL is telling me "Invalid response". Am I a bot?
Now you know why your mom spent so much time with the Amiga
https://anubis.techaro.lol/docs/user/known-broken-extensions
If you have JShelter installed, it breaks the proof of work from anubis
You're using a VPN, right?
Nope. Just using Vivaldi on my Android device.
Im not and it gave an invalid response. I am just chilling on my home wifi.
I’m sorry you had to find out this way.