Hacker Newsnew | past | comments | ask | show | jobs | submit | gigaArpit's commentslogin

Robots.txt is a joke, use Cloudflare's Block AI Bots feature if you are using Cloudflare.


More precisely described as "block non-sanctioned user agents".

Using that feature will ensure I never visit your site again.


It's a free wiki so not visiting is a right you have, sure .. but also OP shouldn't have to spend loads of $ to host a free wiki and at the same time support a very bespoke thing such as non standard user agents.


> is a right

I believe they're saying that cloudflare will block them just for using a blacklisted client, even if they're legit users and not bots


Not even that; it's a whitelist, not a blacklist, and the only clients whitelisted are essentially those of Big Browser.


On this point, if you turn on bot fight mode it also says it blocks verified bots.

But, bot fight mode says "there is a newer version of this setting" however it does not link to it.

Anyone have any insight on the blocked verified bots or the supposed new version?


Each code has a defined meaning, which helps clients and servers understand the nature of the response.


I know very well what an http response code is. But what is the reason for wanting to extend the standard? What exactly makes 450 better than 404? Who benefits from it, and in what form?


The idea behind introducing a new HTTP status code for blocking AI traffic should be to provide a clearer signal to both users and automated systems about the nature of the response. While a 404 indicates that a resource is not found, it doesn't convey the specific reason that the content is being restricted from AI access. While User-Agents can easily be spoofed by users and automated traffic, as you mentioned in your earlier reply, the HTTP status code would be a clear message from the server.


If it’s a clear message, they will simply avoid your AI identification heuristics. Better to feed them 404s so they eventually give up (or you drop their requests with no response, assuming known IP blocks crawls originate from). The less signal provided, the better, broadly speaking.


Following this reasoning, a 200 response would also be appropriate, as AI bots wouldn't flag anything as suspicious.


I don't think there's a real meaning beyond the premise to a joke.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: