Hacker Newsnew | past | comments | ask | show | jobs | submit | bdefore's commentslogin

I created and maintain ProtonDB, a popular Linux gaming resource. I don't do ads, just pay the bills from some Patreon donations.

It's a statically generated React site I deploy on Netlify. About ten days ago I started incurring 30GB of data per day from user agents indicating they're using Prerender. At this pace almost all of that will push me past the 1TB allotted for my plan, so I'm looking at an extra ~$500USD a month for the extra bandwdith boosters.

I'm gonna try the robots.txt options, but I'm doubtful this will be effective in the long run. Many other options aren't available if I want to continue using a SaaS like Netlify.

My initial thoughts are to either move to Cloudflare Pages/Workers where bandwidth is unlimited, or make an edge function that parses the user agent and hope it's effective enough. That'd be about $60 in edge function invocations.

I've got so many better things to do than play whack-a-mole on user agents and, when failing, pay this scraping ransom.

Can I just say fuck all y'all AI harvesters? This is a popular free service that helps get people off of their Microsoft dependency and live their lives on a libre operating system. You wanna leech on that? Fine, download the data dumps I already offer on an ODbL license instead of making me wonder why I fucking bother.


Proton DB is an amazing website that I use all the time. Thank you for maintaining it!


Thanks. Appreciate your support, and very glad it brings you value.


$500 for exceeding 1TB? The problem here isn't the crawlers, it's your price-gouging, extortionate hosting plan. Pick your favourite $5/month VPS platform - I suggest Hetzner with its 20TB limit (if their KYC process lets you in) or Digital Ocean if not (with only 1TB but overage is only a few bucks extra). Even freaking AWS, known for extremely high prices, is cheaper than that (but still too expensive so don't use it).


> The problem here isn't the crawlers, it's your price-gouging, extortionate hosting plan.

No, it's both.

The crawlers are lazy, apparently have no caching, and there is no immediately obvious way to instruct/force those crawlers to grab pages in a bandwidth-efficient manner. That being said, I would not be surprised if someone here will smugly contradict me with instructions on how to do just that.

In the near term, if I were hosting such a site I'd be looking into slimming down every byte I could manage, using fingerprinting to serve slim pages to the bots and exploring alternative hosting/CDN options.


> The problem here isn't the crawlers,

One of the worst takes I've seen. Yes, that's expensive, but the individuals doing insane amounts of unnecessary scraping are the problem. Let's not act like this isn't the case.


To clarify the math. Netlify bills $50 for each 100GB over the Pro plan limit of 1TB. Which is the barrel I'm looking down just this month before others get the same idea. So yes, I'm squeezed on both side unless I put the work in to rehost.


I went to a Subway shop that charged $50 per lettuce strip past the first 20. As the worker sprinkled lettuce on my sandwich, I counted anxiously, biting my nails. 19, phew, I'm safe. I think I'll come back here tomorrow.

Tomorrow, someone in front of me asked for extra lettuce. The worker got confused and put it on my sandwich. I was charged $1000. Drat.


> The worker got confused and put it on my sandwich.

No, this is where you're completely and totally incorrect. There is no 'worker accidentally making a human mistake that costs you money' here. This is a 'multi-billion dollar company routinely runs scripts that they KNOW cost you money, but do it anyways because it generates profit for them'. To fix your example,

You RUN a Subway that sells sandwiches. Your lettuce provider charges you $1 per piece of lettuce. Your average customer is given $1 worth of lettuce in their sub. One customer keeps coming in, reaching over the counter, and grabbing handfuls of lettuce. You cannot ban this customer because they routinely put on disguises and ignore your signs saying 'NO EXTRA LETTUCE'. Eventually this bankrupts you, forces you to stop serving lettuce in your subs entirely, or you have to put up bars (eg, Cloudflare) over your lettuce bins.


I'm not sure what Netlify is doing, but the heaviest assets on your website are your javascript sources. Have you considered hosting those on GitHub pages, which has a generous free tier?

The images are from steamcdn-a.akamaihd.net, which I assume is already being hosted by a third-party (Steam)


I'd rather not involve Microsoft but I recognize there are other options. It is additional work/complexity I'll probably have to take on.


Hey, I just wanted to say your site is amazing and has helped me SO much and I am incredibly grateful for all your hard work. I would become a patron if I could afford it, but I can barely make ends meet and don't have $447 a month to spare. :(


Do you have the ability to block ASNs? I help sysadmin a DIY building forum, and we cut 80% of the load from our server by blocking all Alibaba IPs in ASN 45102. Singapore was sending the most bot traffic.


Thank you for making ProtonDB! I use it a ton <3


Please use a default deny on the user agent. It can block a lot of accessability tools and makes privacy difficult.


Did you mean to say don't use a default deny?


Yes


Go for Cloudflare pages.


Your mistake is openly suggesting on HN that you're going to use Cloudflare, increasing the centralization of the internet and contributing to their attestation schemes, while society forces you to be a victim of the tragedy of the commons.


Please believe me that it is not a step I want to take.


Another option that wouldn't contribute to more centralization might be neocities. They give you 3 TB for $5/month. That seems to be _the_ limit though. The dude runs his own CDN just for neocities, so it's not just reselling cloudflare or something.

P.S. Thank you for ProtonDB, it has been so incredibly helpful for getting some older games running.


You don't need to apologize - HN needs to get their heads out of the sand that not everything is a tragedy of the commons, there's a reason why centralization exists, and the decentralized internet as it is now comes with serious drawbacks. We're never going to overcome the popularity of big tech if we can't be honest with the problems they solve.

Also, sue me, the cathedral has defeated the bazaar. This was predictable, as the bazaar is a bunch of stonecutters competing with each other to sell the best stone for building the cathedral with. We reinvented the farmer's market, and thought that if all the farmers united, they could take down Walmart. It's never happening.


In this context, the farmers are trying to deal with rampant abuse that is inconceivable to handle on an individual level.

It's not clear to me what taking down Cloudflare/Walmart means in this context. Nor how banding together wouldn't just incur the very centralization that is presumably so bad it must be taken down.


ProtonDB started with a lunchtime reddit post and a public Google Sheets link I seeded with three of my own tests. Came back from work with a thousand rows in it. In a few weeks I built a frontend around it. The 30k row data migration was brutal but in hindsight was absolutely worth it. AMA.


Not who you responded to, but I thought you described your case elegantly neutral to the God question. I suspected you were coming from that perspective, but your argument stood on its own.

Now that said, if we did find alien life, how would your feelings about God change?


I don't base existence of GOD on this. Simply GOD is only thing that does not need anything to be based on so it is possible to explain existence. I was trying to point that evolution is statistically close to impossible (from my understanding) and GOD made it possible. Thus aliens are unlikely. And it seems like there is no sign of aliens in holly books. If there was aliens and it was conflicting with holly books then it would mean that holly books i believe are fake however i am not saying that existence of Aliens would conflict with Holly books either (I am not sure about that). The explanation of this existence to you is what GOD means to you. And you need to define some element to explain this existence. And what would you define eventually means GOD to you.


Every book, holy or otherwise, that was ever written was written by humans. What makes you think they are “real” in the first place?

There’s no mention of a ton of very significant things we now know to be real in any holy book. Why isn’t that stuff enough “proof” they’re “fake”?


I don't believe you have read your "holly" books. There are plenty of alien beings described in detail in that book.

I also reject your out-of-hand assertion that whatever one thinks about the world's existence has the same meaning to someone as your mythical being has to you.

How blown would your mind be if I posited to you that what you call "GOD" is the universe? Your holly book says he is everywhere and everything all at once, does it not? That "He is within all of us", and yet we are the universe observing itself. He came about out of nowhere with no trace of origin? We don't know what caused the big bang, it just happened.

If aliens exist, it's because the universe "allowed" it to happen (aka the statistical probability is high enough for it to occur or we would not be alive having this conversation) not some conscious divine being.


Which logically means your definition of GOD would include a non sentient physical processes that lacks a free will etc. At that point why label it GOD rather than say space time or whatever?


Existence of universe has nothing to do with something about universe. Or at leas not have to. You cannot explain existence of space time with space time


Why not? If GOD has always existed why can’t space and time?


That is the thing if you are talking about space-time it didn't existed before the Big Bang.


That’s one model, but we really don’t know what happened before the Big Bang.

Still labeling pre Big Bang as SpaceTimeTime or calling that Space Time and relabeling what we experience as local space time doesn’t really matter here. The point is our Big Bang could be happening infinite times in a multiverse without beginning or end. That’s one possibility where there isn’t anything conscious to point to as a GOD just the same kind of things which make up physical reality.


Tried your beta out and enjoyed it for a while but then got hit with the auth wall after ~10 messages. Kind of irrational but I felt insulted that even after I did so my conversation was nowhere in chat history.

People get attached to bots quick I guess! I recommend: 1) communicate to the provisional user how many moore messages they get before having to login and 2) persisting those pre-auth chat histories.


I'd be curious to hear of your experience. Let me know how to reach you?


Fascinating he would choose to retire at 39, in a position of what sounds of great influence in the space program. Anyone know the context around this? Was it mandated?


The entire crew of Apollo 7 were essentially blacklisted from ever flying again after the crew had some heated disagreements with ground control during the mission and ended up forcing changes to the schedule. The top brass saw it as mutiny. Cunningham knew he would never get to fly another mission.


Interesting I had no idea. This was also the first crewed mission following the Apollo 1 failure. There's some explanation of what they disagreed on here: https://en.wikipedia.org/wiki/Apollo_7#Conflict_and_splashdo...

The commander resisted the start of the first live TV broadcast in space for safety reasons. The mission had both technical and public relations goals, and it's fascinating to see these in understandable conflict.


That great innovation is at the root of the market's disregard for negative externalities such as climate change. I'm not in support of jailing shareholders, but say a tax on the dividends of shareholders who front the capital for companies that impose a burden on civilization doesn't sound outrageous to me. Especially since fining the company directly doesn't necessarily discourage bad behavior at the tiers of power that have the ability to take a different path.


Aside from the well written article itself, it's remarkable the thought and time put into the commentary that follows. This reminds me how much the quality of commentary has reduced on most places that use them. As well as how much effort I put in to contribute. Valuable discussion used to be scattered across various blogs and HN is in some ways the last vestiges of it.


A powerful engagement and influence was left by MySpace, but it's not put up in protective glass for ticketed visitors to take snapshots of. I appreciate the nuance of its impact on the world, but... we've moved on.


> A powerful engagement and influence was left by MySpace, but it's not put up in protective glass for ticketed visitors to take snapshots of

Maybe we should have. I'm sure a MySpace archive would have significant anthropological value a century from now. I don't really find this argument convincing to be honest.

I mean, you're basically arguing that we should tear down Ancient Greek and Roman buildings, tear down the Eiffel tower, and so on, "because we've moved on".


No one suggested destroying the Mona Lisa or other artifacts. The comment was regarding the hype surrounding the Mona Lisa hundreds of years later.


Irony bells are ringing with the New Yorker magazine chasing its own TikTok dreams, autoplaying an utterly unrelated video halfway down the article.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: