Cambrian Social | Display

mastodon - Link to source

Codeberg

3 months ago • •

Codeberg
3 months ago • •

We apologize for the long performance degradation today.
Finally, we identified all of the 'tricks' that AI crawlers found today. They no longer bypass the anubis proof of work challenges.

A novelty for us was that AI crawlers seem to not only crawl URLs that are actually presented to them by our frontend, but they converted the URLs into a format that bypassed our filter rules.

By the way, you can track the changes we have been doing via

codeberg.org/Codeberg-Infrastr…

scripted-configuration

An attempt at a much more simple and intuitive configuration system (used for most of our services)

^Codeberg.org

like this

reshared this

in reply to Codeberg

mastodon - Link to source

Mason Loring Bliss

in reply to Codeberg • 3 months ago • •

They're malicious.

in reply to Codeberg

mastodon - Link to source

bit101

in reply to Codeberg • 3 months ago • •

Ah I did notice a super long push time this morning. That explains it. Sucks that we have to deal with that crap. But thanks for the transparency.

in reply to Codeberg

mastodon - Link to source

cλémentd

in reply to Codeberg • 3 months ago • •

AI is malware

in reply to Codeberg

mastodon - Link to source

Marc

in reply to Codeberg • 3 months ago • •

fucking parasites.

in reply to Codeberg

mastodon - Link to source

Hacker

in reply to Codeberg • 3 months ago • •

Malware has the same behaviour.

in reply to Codeberg

mastodon - Link to source

Toni Aittoniemi

in reply to Codeberg • 3 months ago • •

Now OpenAI is giving real users their Atlas browser, so they can scrape while users bypass the security and provide them with logins.

Disguisting.

in reply to Codeberg

mastodon - Link to source

indyradio

in reply to Codeberg • 3 months ago • •

I'm going to try a normal, respectful, old fashioned crawl and see if that's possible.
This is the search engine for indyradio, it has documents google won't find and for me usually works better. excuse the plug.
I'll let you know what results I got.
lookdown.org

YaCy '_anonufe-66074517-59': Search Page

Software HTTP Freeware Home Page

^lookdown.org

This entry was edited (3 months ago)

in reply to Codeberg

mastodon - Link to source

Amt_e

in reply to Codeberg • 3 months ago • •

Already witnessing that the harm and impact of AI for all-day life is way beyond it's benefits!

in reply to Codeberg

mastodon - Link to source

Aerofreak | USA WTF?

in reply to Codeberg • 3 months ago • •

Now give this thread some hashtags because it's important.

in reply to Codeberg

mastodon - Link to source

Claudius

in reply to Codeberg • 3 months ago • •

AI companies crawl our websites.

We ask that they stop by using the industry standard robots.txt

AI companies ignore those rules.

We start blocking the companies themselves with conventional tools like IP rules.

AI companies start working around those blocks.

We invent ways to specifically make life harder for their crawlers (stuff like Anubis).

AI companies put considerable resources into circumventing that, too.

This industry seriously needs to implode. Fast.

Lorraine Lee likes this.

reshared this

in reply to Claudius

mastodon - Link to source

Claudius

in reply to Claudius • 3 months ago • •

As a next step, AI companies are now offering "their" browser (read: Chromium ever so slightly themed with some company bullshit built in)

In part, this is certainly done to have yet another way to crawl the web, but this time user-directed and indistinguishable from actual human requests.

Lorraine Lee likes this.

reshared this

in reply to Claudius

sharkey - Link to source

jeff (sane)

in reply to Claudius • 3 months ago • •

we need crawler honeytrap that sends back random racist 4chan posts as the response.

in reply to Claudius

mastodon - Link to source

chrysn

in reply to Claudius • 3 months ago • •

@claudius Frankly, I'm kind of half-OK with that one: There's still the troubling copyright aspect, but at least being the browser and loading nothing but user viewed content at least gets their load off our servers.

@Claudius

in reply to chrysn

mastodon - Link to source

Claudius

in reply to chrysn • 3 months ago • •

@chrysn second step: DDoS. If they are on the computer anyway, why not deputize them for crawling?

@chrysn

in reply to Claudius

mastodon - Link to source

chrysn

in reply to Claudius • 3 months ago • •

@claudius If there's actual page-consuming users behind every single request, it'd take a colossal effort to pull of DDoS. Cloudflare (whose business interest admittedly is to over-report DoS attacks) clocks even 2010-level attacks at 600k requests per second, so even with low-attention-span users (maybe 5s/page), that'd take 3 million humans for the duration of the attack. If someone can just so convince 3M people to constantly click through slow-loading pages, we have bigger issues than DoS.

@Claudius

in reply to chrysn

mastodon - Link to source

chrysn

in reply to chrysn • 3 months ago • •

@claudius Of course, if their browsers load content *beyond* what the viewed page is including and the explicit preload links, then those users turned their hosts into part of a botnet willingly, and need to expect blocking like any other botnet.

@Claudius

in reply to Claudius

mastodon - Link to source

Todd Knarr

in reply to Claudius • 3 months ago • •

@claudius At some point we're going to start paying a lawyer a few dollars to send the AI companies a registered return-receipt-requested letter saying "You are denied access to my web site. I have taken every step possible to prevent you from accessing it. If you continue to circumvent these measures and access my site anyway, you will be billed $1000/access. This fee will take effect 14 days after you receive this notice."

Then start sending bills.

@Claudius

in reply to Claudius

glitchsoc - Link to source

Jennifer Kayla | Theogrin 🦊

in reply to Claudius • 3 months ago • •

@claudius
More folks need to begin adopting ... unorthodox solutions for those groups which have been so wonderful as to ignore robots.txt. Disguised petabyte ZIP bombs. Poisoned pages. Image folders chock full of Nightshade.

The legal argument to be made and adopted here is that if the companies weren't willfully breaking the law, then they wouldn't have subjected themselves to those attacks. It certainly doesn't even fall under entrapment in most cases.

@Claudius

in reply to Claudius

mastodon - Link to source

Jonathan Kamens 86 47

in reply to Claudius • 3 months ago • •

I would like to believe that if the US federal government weren't completely fucked up right now then OpenAI and the other AI parasites with a nexus in the US would have been criminally charged by now with violating the #CFAA by actively circumventing the crawling protections added recently to websites specifically to block them.
Alas, the government is too busy engaging in vindictive prosecution of #Trump's enemies who aren't actively bribing him.
#infosec #AI
Ref: darmstadt.social/@claudius/115…

2025-10-25 21:11:27

mastodon - Link to source

AI companies crawl our websites.
We ask that they stop by using the industry standard robots.txt
AI companies ignore those rules.
We start blocking the companies themselves with conventional tools like IP rules.
AI companies start working around those blocks.
We invent ways to specifically make life harder for their crawlers (stuff like Anubis).
AI companies put considerable resources into circumventing that, too.
This industry seriously needs to implode. Fast.

#infosec #Trump #ai #cfaa

in reply to Claudius

mastodon - Link to source

YourShadowDani

in reply to Claudius • 3 months ago • •

@claudius
I feel like we are working towards a point where you have to redesign the whole web to account for AI ignoring rules.

New browsers, new protocols, etc.

@Claudius

in reply to YourShadowDani

mastodon - Link to source

Dagnabbit, Pascaline! 🌼

in reply to YourShadowDani • 3 months ago • •

@YourShadowDani

I look back at the good old days, when one day I client asked me to bulletproof their websites and computers so they could never steal something, and I went under the desk and unplugged their first computer.
They learned.

But now with AI it's a whole other level.

@claudius @Codeberg

@Codeberg @Claudius @YourShadowDani

in reply to Codeberg

mastodon - Link to source

delta_fsociety

in reply to Codeberg • 3 months ago • •

it’s ok I still admire you’re job, thanks 🙏

in reply to Codeberg

mastodon - Link to source

SomeVeganCheeseIsOk

in reply to Codeberg • 3 months ago • •

can you sue the crawlers for reimbursement for downtime?

in reply to Codeberg

mastodon - Link to source

Mark Rotteveel

in reply to Codeberg • 3 months ago • •

I think you should file criminal complaints. What they're doing is a computer crime (trying to circumvent protections/security of an automated work).

in reply to Codeberg

mastodon - Link to source

Hanspeter Holzer 🎤🔉⚡🔌🧑‍🎤

in reply to Codeberg • 3 months ago • •

könnte hier nicht mal der ansonsten furchtbare #hackerparagraph helfen?

#hackerparagraph

in reply to Codeberg

mastodon - Link to source

LogicalErzor

in reply to Codeberg • 2 months ago • •

how about forcing the user to login to see any repos, otherwise just display the homepage?

unfortunately we have to take a page out of paywall tactics and make it so that only logged in users can view public repos. otherwise its a cat and mouse game

easy to rate limit and ban individuals too when theyre logged in

⇧