Edit: See replies below. Cloudflare CEO says this use case is fine.
This is a cool project and something I will probably use for some hobby projects.
I would caution against it for anything more than a hobby project as it violates the Cloudflare TOS:
> 2.8 Limitation on Non-HTML Caching
> The Service is offered primarily as a platform to cache and serve web pages and websites. Unless explicitly included as a part of a Paid Service purchased by you, you agree to use the Service solely for the purpose of serving web pages as viewed through a web browser or other application and the Hypertext Markup Language (HTML) protocol or other equivalent technology. Use of the Service for the storage or caching of video (unless purchased separately as a Paid Service) or a disproportionate percentage of pictures, audio files, or other non-HTML content, is prohibited.
The main point of this article is to use a Cloudflare cache-everything rule and use that caching to create a free image host. From the article:
> I'd heavily recommend adding a page-rule to set the "cache level" to "everything", and "edge cache TTL" to a higher value like 7 days if your files aren't often changing.
I am not saying not to trust the word of the CEO, but this exact use case of using cloudflare as a image hosting comes up a lot on HN.
The word on the street is that they will start throttling and contacting you once you hit several hundred TB per month. [1][2][3][4][5][6]
Of course this is still extremely generous and the upgrade plans are usually still several orders of magnitude cheaper than any cloud provider per gb. But don't build a business or hobby project around cf providing unlimited free bandwidth forever.
Not only do things change but CF has hundreds of employees that weren't CC'd on that informal permission so there's still a high chance of being inconvenienced, and there's a decent chance the CEO won't be at your disposable should a problem occur.
Should CloudFlare later ban you for the practice, will the random support person you reach unpack the CEO's comments here and ensure nothing changed internally that prevents allowing your continued use and advocate restoring your account for you?
It’s not so different from when your company provides a perk that you expect not to see again.
If the perk saves you money, you put that money in savings. Once your budget expands to depend on that perk you are trapped, and when it goes away the pain will be noteworthy.
In words that are more applicable to a business case: You have to have a strategy for when the Too Good To Be True situation ends, because it will, and you have less control over when than you think you do.
Backblaze is cheap, but if you're uploading millions og files, beware -- there is no way to just nuke/empty a bucket with a click of a button. If you're not keeping filename references in an external database, you are left to sequentially scan and remove files in batches of 1000 in a single thread.
Support could not help, and it took me months to empty a bucket that way.
You need a DB of all of the dead entries that need to be deleted, and that’s a fine thing to have.
There are lots of problem spaces where deletion is expensive and so is time shifted not to align with peak system load. Some sort of reaper goes around tidying up as it can.
But I think by far my favorite variant is amortizing deletes across creates. Every call to create a new record pays the cost of deleting N records (if N are available). This keeps you from exhausting your resource, but also keeps read operations fast. And the average and minimum create time is more representative of the actual costs incurred.
My case was really simple. I was done with my ML pipeline and nuked the database, but pics in B2 remained with no quick way to get rid of them and/or to stop the recurring credit card charges.
IMO an "Empty" button should have been implemented by Backblaze.
A single pass: paginating through all entries in the bucket without deletion, just to build up your index of files. And then using that index to delete objects in parallel.
Backblaze currently recommends you do this by writing a “Lifecycle rule” to hide/delete all files in the bucket, then let Backblaze empty the bucket for you on the server side in 24 hours: https://www.backblaze.com/b2/docs/lifecycle_rules.html
It’s cheap but it’s proving unacceptably slow for me - sometimes I see 2.5s TTFB for accessing tiny audio files in my region (Berlin, EU). Server uploads are also quite unreliable, had to write a lot of custom retry logic to handle 503 errors (~30% probability when uploading in batch).
Great for it’s intended use (backups), but I’ll be switching to an S3 compatible alternative soon - eyeing Digital Ocean Spaces or Wasabi...
B2 is:
- .5 cents/GB/mo
- 1GB/day free egress, 1 cent/GB after
- generous free API call allowances, cheap after that
Wasabi is:
- $0.0059 cents/GB/mo (18% higher)
- all storage billed for at least 90 days
- minimum of $5.99 per month
- this doesn't include delete penalties
- all objects billed for at least 4K
- free egress as long as "reasonable"
- free API requests
- overwriting a file is a delete, ie, delete penalties
With HashBackup (I'm author), an incremental database backup is uploaded after every user backup, and older database incrementals get deleted. Running simulations with S3 IA (30-day delete penalties), the charges were 19 cents/mo vs 7 cents/mo for regular S3, even though S3 is priced much higher per GB. So for backups to S3, HashBackup stores the db incrementals in the regular S3 storage class even if the backup data is in IA.
For Wasabi, there is no storage class that doesn't have delete penalties, and theirs are for 90 days instead of 30.
It used to be $0.0049 for the free egress plan so that's changed then. They do have lower storage pricing if you are on a paid-egress plan which is the same as Backblaze.
Either way, Wasabi is about simplicity and doesn't have any concept of storage classes. It's true that there's a 90-day min storage fee involved but that's only an issue if you're deleting constantly.
Those stats sound insane to me, and certainly don't reflect what I see.
I see 50ms or less TTFB, for images in the sub 200Kb range, and for videos in the 500Mb+ range, from Australia where the internet is still terrible.
I've only ever a single serve upload fail me - and it occurred when an upload hit a major global outage of infrastructure. In two years of regularly uploading 8Gb/200 files a fortnight (at the least), I've never needed custom retry logic.
If you are seeing 50ms TTFB between B2’s only datacenter (in California) and Australia, there is something wrong with your methodology or you have discovered FTL communication.
I've been seeing pretty bad upload failures (probably around 30%) for uploading hundreds of 30-40 MB files per month to B2 from New Zealand since I started using B2 over a year ago.
And I'm not convinced it's connectivity issues, as I can SCP/FTP the same files to servers in the UK...
When I test using an actual software client (Cyberduck) to do the same thing to B2, I see pretty much the same behaviour: retries are needed, and the total upload size (due to the retries) is generally ~20% larger than the size of the files.
Interesting. I have a webm media website where I've migrated hundreds of thousands of videos about that size from s3 to b2 with thousands of additional per month with almost zero issues. I didn't even have/need retry logic until I was on horrible internet from a beach for a month where long connections were regularly dropped locally.
Felt TTFB and download speed were great too considering the massive price difference compared to s3. Though also used Cloudflare workers anyways to redirect my URLs to my b2 bucket with caching.
How well can you cache the worker responses on CF? Can you prevent spinning one up & therefore incurring costs after the first given unique URL request is handled? Looking into now.sh for a similar use case (audio), but pondering how to handle caching in a bulletproof way as I'm afraid of sudden exploding costs with "serverless" lambdas...
You're very welcome - I'm glad it was helpful. B2 is significantly cheaper than S3, especially when paired with Cloudflare for free bandwidth. If you're interested, my company Nodecraft talked about a 23TB migration we did from S3 to B2 a little while ago: https://nodecraft.com/blog/development/migrating-23tb-from-s...
How does CloudFlare themselves afford to give bandwidth for free? I understand that I can pay $20/mo for pro account but they also have a $0/mo option with fewer bells and whistles. What gives them the advantage to charge nothing for bandwidth?
Because we’re peered with Backblaxe (as well as AWS). There’s a fixed cost of setting up those peering arrangements, but, once in place, there’s no incremental cost. That’s why we have similar agreements to Backblaze in place with Google, Microsoft, IBM, Digital Ocean, etc. It’s pretty shameful, actually, that AWS has so far refused. When using Cloudflare, they don’t pay for the bandwidth, and we don’t pay for the Bandwidth, so why are customers paying for the bandwidth. Amazon pretends to be customer-focused. This is a clear example where they’re not.
Thank you for clarifying. If I were to use a Google cloud service from a Cloudflare Worker would there be no bandwidth charges? That would change everything for us.
as AWS is primary cash cow for Amazon I doubt they would ever change that. Bandwidth fees are a key profit maker for them on the other hand AWS's crazy bandwidth pricing is prob. pretty beneficial to driving customers towards you guys.
Their terms indicate it should be used for caching html primarily. So if they find costly abusers, they could use this clause to get them to upgrade to a paid tier.
> 2.8 Limitation on Non-HTML Caching
The Service is offered primarily as a platform to cache and serve web pages and websites. Unless explicitly included as a part of a Paid Service purchased by you, you agree to use the Service solely for the purpose of serving web pages as viewed through a web browser or other application and the Hypertext Markup Language (HTML) protocol or other equivalent technology. Use of the Service for the storage or caching of video (unless purchased separately as a Paid Service) or a disproportionate percentage of pictures, audio files, or other non-HTML content, is prohibited.
According to their documentation, Wasabi will move you on to their $0.005/GB storage $0.01/GB download plan (same price as B2 without api charges) from their $0.0059/GB storage free egress plan if you download more than your total storage (eg. with 100TB stored don't download more than 100TB/month).
You get more features and higher cost. Basically B2 gives you storage, but not detailed access management, endpoint internal to your vpc, and other extras.
For backups, media and archival use-cases it looks really good for the price if you can live with it being in the US.
If you are doing any large data-processing using S3 you get the advantage of data locality, with VPC endpoints you can also bypass NAT gateway data charges and get much higher bandwidth.
You get 100,000 request per month and up to 1,000 requests in a 10 minute timeframe. So if you have a page with 10 images on it and you get 100 people visiting that page within a 10 minute timeframe, you will use up all of your free tier and all new visitors will get a 1015 error.
For paid plans you must pay at least $5 and get 10 million requests included and additional requests are 50 cents per million.
You get 100,000 requests per day, not per month. The burst limits are definitely a concern for heavy traffic, but for just $5 you can remove the burst limits entirely, as you mention.
They rarely (or never?) go down at the same time for any reason, other than the standard Internet BGP drama that all providers are at risk of and have no control over.
That doesn't seem completely fair. Much like blaming google for using normal commodity servers compared to Altavista using high end enterprise hardware. What matters is the reliability of the system, not some random part.
Backblaze is quite transparent about how they do things. They publish their drive reliability numbers (including brand/model numbers), storage pod design, and how their sharding/redundancy works.
Seems like most cloud storage vendors just say "We do object storage right handwave and we have lots of 9s". Backblaze says they shard your data into 20 pieces onto 20 servers and can recover with any 17 of those pieces. More details at https://www.backblaze.com/blog/reed-solomon/
Sure that's not enough redundancy for some, but at least you know what to expect and can plan accordingly. I've not see any other cloud vendor do that. Please post URLs for similar info from other companies.
o365 home is $99.99/yr (not $50/mo), and allows up to 5 users, each of whom gets their own 1TB OneDrive allotment, evergreen desktop and mobile office software, skype minutes, etc.
It's a much better deal than paying $80/year for 1TB of OneDrive if you have 2+ users.
Dropbox and Google Drive both removed HTML hosting over the past few years. With drive you can't even get direct links to images etc anymore. Not sure if public Dropbox files have the same limitation.
Workers are also used for basic CORS headers, and stripping some other unnecessary headers. They're definitely not required, but I don't believe you can do URL rewriting with page rules; redirects, sure, but not rewriting.
This is a cool project and something I will probably use for some hobby projects.
I would caution against it for anything more than a hobby project as it violates the Cloudflare TOS:
> 2.8 Limitation on Non-HTML Caching
> The Service is offered primarily as a platform to cache and serve web pages and websites. Unless explicitly included as a part of a Paid Service purchased by you, you agree to use the Service solely for the purpose of serving web pages as viewed through a web browser or other application and the Hypertext Markup Language (HTML) protocol or other equivalent technology. Use of the Service for the storage or caching of video (unless purchased separately as a Paid Service) or a disproportionate percentage of pictures, audio files, or other non-HTML content, is prohibited.
https://www.cloudflare.com/terms/
For something small, they won't care. If your images make the front page of reddit, you might get shut down.