I'd still like the ability to just block a crawler by its IP range, but these days nope.
1 Hz is 86400 hits per day, or 600k hits per week. That's just one crawler.
Just checked my access log... 958k hits in a week from 622k unique addresses.
95% is fetching random links from u-boot repository that I host, which is completely random. I blocked all of the GCP/AWS/Alibaba and of course Azure cloud IP ranges.
It's almost all now just comming of a "residential" and "mobile" IP address space from completely random places all around the world. I'm pretty sure my u-boot fork is not that popular. :-D
Every request is a new IP address, and available IP space of the crawler(s) is millions of addresses.
I don't host a popular repo. I host a bot attraction.
In addition to a rate limit, a page limit per IP is needed; this is specifically for things like source code repos (with massive commit histories), mailing archives, etc.
A whitelist would be needed for sites where getting all the pages make sense. And probably in addition to the 1Hz, an additional limit of 1k/day would be needed.
I can see now why Google has not much solid competition (Yandex/Baidu arguably don't compete due to network segmentation).
Scraping reliably is hard, and the chance of kicking Google off their throne may be even further reduced due to AI crawler abuse.
PS 958k hits is a lot! Even if your pages were a tiny 7.8k each (HN front page minus assets), that would be about 7G of data (about 4.6 Bee Movies in 720p h256).
Yep, that’s why that’s all over the place now. The cookie thing is more of a first line of defense. It turns away a lot of shoddy scrapers with nearly no resources on my side. Anubis knocks out almost all of the remainder.
> 20000 / 8 customers / 40 USD/mo = 62 months just to recoup CPU and RAM let alone other components.
the most important question is the power, such a setup will blow through easily 500W of power. Granted, a datacenter may not pay 33 ct/kWh but even then, 1/3rd of the monthly income will just go towards naked electricity, not including cooling.
Netcup does oversubscribe/overshare but not sooo much. I have a server there and I don't really observe too much but although I haven't really gotten ways to detect that stealing factor but there are definitely scripts to detect it, maybe I will run it some day but oh well the laziness.
The most overshared vps provider I know is contabo. Literally search anything on reddit,lowendbox, literally anywhere where there are people and they mention about how ~20-30% figure top of my head could be oversubscribed
I am not exactly sure but my point is that when I first saw them, I found them the cheapest option (with their contabo auctions for something at really scale like 96 gb ram or something) but they are literally out of my book as well even as a frugal guy just because of how unstable they are or how much consistent I have seen people struggle about contabo. It's simply unrecommended imho. Netcup's 10x more pleasant from what I see other people's reaction to. People do mention some stealing factor on netcup but overall its really good and that sort of aligns with my experience with them too ig/
Their goal should be to keep stealing close to zero while utilizing all their hardware. You want this too, because it keeps costs down. It does mean sometimes there will be stealing spikes, but most customers don't want to spend twice as much to avoid that, so it's a win/win.
Agreed. From what I know in hosting business/vps providers, they usually have it for a 2x factor. And honestly its not even noticable so much mostly as you mention not unless they have a crazy factor like from what I hear contabo has from forums
This is also the reason why vps's are said to not use 100% usage 24/7 as it can be noisy for other people.
Also another interesting idea by netcup is that they launche virtual dedicated cores (but still vps at heart) where you actually can use 100% usage 24/7 but to be honest on websites like lowendtalk, I have heard that be described as eerily similar to either bare metal instance or this new cloud metal instance terminology as well but the difference to me feels vdedicores atleast on netcup are focused to be more cheaper than dedicated in many instances but I haven't compared it but I have heard it be described as such in lowendtalk ig.
I started with Slackware Linux—something arguably even more “hard-core” than Arch.
What mattered most at the beginning was good installation documentation, and both Arch and Slackware delivered on that front. Slackware, however, had an additional appeal: it was intentionally simple, largely because it was created and maintained by a single person at the time. That simplicity made it feel conceivable that the system could be fundamentally understood by a single human mind.
Whether a newcomer appreciates the Slackware/Arch approach depends heavily on learning style and goals. You can click through a GUI installer and end up with a working distro, but then what? From a beginner’s perspective, you’ve just installed something somehow—and it looks like a crippled Windows machine with fewer buttons.
Starting with Slackware gave me a completely different starting reference point. Installing the system piece by piece was genuinely exciting, because every step involved learning what each component was and how it fit into the whole. The realization that Linux is essentially a set of Lego bricks—and that I might actually master the entire structure, or even build my own pieces—was deeply motivating.
That mindset was strongly shaped by how Slackware and similar distros present themselves. Even the lack of automatic dependency management acted as an early nudge toward thinking seriously about complexity, trade-offs, and minimalism, which stayed with me forever.
Yeah, likely not. There are all kinds of shaes of gray in between that and windows style OS instalation, though.
Arch Linux just sits somewhere in the middle, where you don't build anything, but you get a guided tour of partitioning and formatting a disk, installing some set of packages and setting up a bootloader, from a fairly rich adn comfortable command line environment of a USB live distro.
That's an extermination camp. Concentration camps are just for segregating and isolating people from society. Like what US did to Japanese in the past.
It's pretty clear for decades. When exactly did some higher up in the US gov end up in jail for ordering eg. mass killings abroad, or colluding with others that engaged in mass crimes like initiating wars and conflicts.
US will not lock up a single asshole who helps kill thousands of people abroad (not even inconvenience them with a simple court appearance to have to justify themselves), but it sure can lock up thousands on flimsiest justifications like FTA in court because of whatever, or technical parole violations, or driving on suspended license, basically for failures to navigate bureaucracy while poor.
I'll believe in rule of law when at least shits who materially support mass killings of children will start getting locked up. But alas, no. No such thing.
Until then it's all just bullshit that normal people have to submit to, and ruling class gets to excuse itself from with endless lawyering, exceptions, and nonsense, while it's clear they're still just scum psychos doing scum psycho things.
Equating civil resistance, even in heated forms like disrupting raids or blocking roads, with decades‑long insurgencies that involved organized armed groups, territorial control, foreign combatants, and protracted guerrilla campaigns is like comparing a neighborhood disagreement over lawn care to Napoleon invading Russia.
Like i've said over and over, the tactics used are the distilled what works from those insurgencies honed over decades. They are incredibly effective. The network that was built (several max signal chats, organized territory, labor specialization) has essentially created an effective targeting mechanism.
This isn't a bunch of people organically protesting, this is an organized system designed to "target" ICE agents. The only difference is the payload delivery between physical disruption vs weapon based attacks.
So what's the supposed goal of this "targeting" of ICE agents? Because that's a key to the insurgency vs protest thing.
We have chats, organized territory and labor specialization in a company I work for, too. It doesn't say anything by itself. It's just describing a means of human cooperation. Goal is to write software. You can have organized protest movement too. Unless the goal is to overthrow governing authority, or whatnot, it's not insurgency.
Because if the deterrent here is a line item so small it shows up as 'miscellaneous vibes' on a balance sheet, that's not a barrier. That's a tip jar.
reply