We apologize for the long performance degradation today.
Finally, we identified all of the 'tricks' that AI crawlers found today. They no longer bypass the anubis proof of work challenges.
A novelty for us was that AI crawlers seem to not only crawl URLs that are actually presented to them by our frontend, but they converted the URLs into a format that bypassed our filter rules.
By the way, you can track the changes we have been doing via
codeberg.org/Codeberg-Infrastrβ¦
scripted-configuration
An attempt at a much more simple and intuitive configuration system (used for most of our services)Codeberg.org
like this
reshared this
Mason Loring Bliss
in reply to Codeberg • • •bit101
in reply to Codeberg • • •Tony
in reply to Codeberg • • •this arms race is so bizarre and absolutely horrendous for our historical record of the evolution of human information technology. The fact that robots txt is the only defence (that mostly does not work) is maddening.
ML companies are basically injesting work they do not own and without permission. DDOSing the creators, hosts and providers and then selling that information.
Surely there should be a universally accepted way to signal no scraping by now???
cλémentd
in reply to Codeberg • • •Marc
in reply to Codeberg • • •Hacker
in reply to Codeberg • • •Toni Aittoniemi
in reply to Codeberg • • •Now OpenAI is giving real users their Atlas browser, so they can scrape while users bypass the security and provide them with logins.
Disguisting.
indyradio
in reply to Codeberg • • •This is the search engine for indyradio, it has documents google won't find and for me usually works better. excuse the plug.
I'll let you know what results I got.
lookdown.org
YaCy '_anonufe-66074517-59': Search Page
lookdown.orgAm te
in reply to Codeberg • • •God Emperor of Mastodon
in reply to Codeberg • • •Aerofreak | USA WTF?
in reply to Codeberg • • •Claudius π
in reply to Codeberg • • •AI companies crawl our websites.
We ask that they stop by using the industry standard robots.txt
AI companies ignore those rules.
We start blocking the companies themselves with conventional tools like IP rules.
AI companies start working around those blocks.
We invent ways to specifically make life harder for their crawlers (stuff like Anubis).
AI companies put considerable resources into circumventing that, too.
This industry seriously needs to implode. Fast.
Lorraine Lee likes this.
reshared this
Space Catitude π, Miro Collas, Billy Smith, L'égrégore André κκ¬, EndlessMason, Shannon Prickett, Kevin Russell, Hubert Figuière, Aral Balkan and Thib reshared this.
Claudius π
in reply to Claudius π • • •As a next step, AI companies are now offering "their" browser (read: Chromium ever so slightly themed with some company bullshit built in)
In part, this is certainly done to have yet another way to crawl the web, but this time user-directed and indistinguishable from actual human requests.
Lorraine Lee likes this.
reshared this
Lorraine Lee, n8chz π© and Lazarou Monkey Terror πππ reshared this.
jeff (sane)
in reply to Claudius π • • •chrysn
in reply to Claudius π • • •Claudius π
in reply to chrysn • • •chrysn
in reply to Claudius π • • •chrysn
in reply to chrysn • • •Todd Knarr
in reply to Claudius π • • •@claudius At some point we're going to start paying a lawyer a few dollars to send the AI companies a registered return-receipt-requested letter saying "You are denied access to my web site. I have taken every step possible to prevent you from accessing it. If you continue to circumvent these measures and access my site anyway, you will be billed $1000/access. This fee will take effect 14 days after you receive this notice."
Then start sending bills.
Jennifer Kayla | Theogrin π¦
in reply to Claudius π • • •@claudius
More folks need to begin adopting ... unorthodox solutions for those groups which have been so wonderful as to ignore robots.txt. Disguised petabyte ZIP bombs. Poisoned pages. Image folders chock full of Nightshade.
The legal argument to be made and adopted here is that if the companies weren't willfully breaking the law, then they wouldn't have subjected themselves to those attacks. It certainly doesn't even fall under entrapment in most cases.
Jonathan Kamens 86 47
in reply to Claudius π • • •I would like to believe that if the US federal government weren't completely fucked up right now then OpenAI and the other AI parasites with a nexus in the US would have been criminally charged by now with violating the #CFAA by actively circumventing the crawling protections added recently to websites specifically to block them.
Alas, the government is too busy engaging in vindictive prosecution of #Trump's enemies who aren't actively bribing him.
#infosec #AI
Ref: darmstadt.social/@claudius/115β¦
Claudius π
2025-10-25 21:11:27
YourShadowDani
in reply to Claudius π • • •@claudius
I feel like we are working towards a point where you have to redesign the whole web to account for AI ignoring rules.
New browsers, new protocols, etc.
πΌ Dagnabbit, Pascaline! πΌ
in reply to YourShadowDani • • •@YourShadowDani
I look back at the good old days, when one day I client asked me to bulletproof their websites and computers so they could never steal something, and I went under the desk and unplugged their first computer.
They learned.
But now with AI it's a whole other level.
@claudius @Codeberg
delta_fsociety
in reply to Codeberg • • •SomeVeganCheeseIsOk
in reply to Codeberg • • •Mark Rotteveel
in reply to Codeberg • • •Hanspeter Holzer π€πβ‘ππ§βπ€
in reply to Codeberg • • •B'ad Samurai π
in reply to Codeberg • • •LogicalErzor
in reply to Codeberg • • •how about forcing the user to login to see any repos, otherwise just display the homepage?
unfortunately we have to take a page out of paywall tactics and make it so that only logged in users can view public repos. otherwise its a cat and mouse game
easy to rate limit and ban individuals too when theyre logged in