I'm sort of amazed they haven't dropped prices more; I guess they can still char...

zelcon · on March 1, 2020

Public cloud is a giant scam to make sure 80% of VC money trickles back up to the big three. Most of these startups in Soma could run their product on a single threaded C++ server. But that's not cool and trendy. You're supposed to pretend that you could suddenly become Google overnight, so you need The Cloud(TM), microservices, and all kinds of redundant garbage.

lordofgibbons · on March 1, 2020

Great. Show me how fast you go to market with a web application and mobile backend written as a single threaded C++ application. Also please tell me how long it took to secure and set up and maintain the server.

Startups do, and always should, focus on getting to market faster so they can get feedback from real customers.

I will take going to market faster over saving a few dollars on server costs everytime I'm given the choice.

Mature products' and companies' use cases are more nuanced.

rhizome · on March 1, 2020

>Show me how fast you go to market with a web application and mobile backend written as a single threaded C++ application. Also please tell me how long it took to secure and set up and maintain the server.

Let's not go crazy, Apache on metal is a very simple setup and at least as secure as S3 out of the box. Platform spread and simplicity are just lost concepts, that's all.

gumby · on March 1, 2020

I'm not sure it'd even be possible to run Apache on bare metal; it has too many dependencies on OS facilities.

I'm curious if people are running bare metal web servers. I'd think there's enough lookup, modules etc that it wouldn't be worth it (small embedded applications like IoT frobs excepted)

idclip · on March 1, 2020

Bare metal just means not a VM. The machine still has an OS.

pinkfoot · on March 1, 2020

No, it really does not mean that.

dragontamer · on March 1, 2020

https://us.ovhcloud.com/dedicated-servers/

> OVHcloud Best Value is a great way for you to experience the advantages of bare metal over virtual servers at an unbeatable price. Our Best Value bare metal servers feature the most stable environment, making it perfect for processing large volumes of data.

The terminology is "bare metal" when buying up dedicated servers these days. Some places still use the term "dedicated", but since the advent of "the cloud", the term "bare metal" has come up, since most instances are ASSUMED to be VPS.

gumby · on March 1, 2020

That doesn’t even make sense. Wouldn’t that be “non-virtualized”?

“Bare metal” has meant right on the bare hardware (silicon) since the 1960s. I don’t think I’ve ever heard anyone use that expression in any other way.

ResidentSleeper · on March 1, 2020

Not really a cloud developer, so I looked up a couple of links and I haven't been able to find any uses of the term other than referring to a single-tenant machine that you have complete control over, as opposed to virtualized solutions.

https://en.wikipedia.org/wiki/Bare-metal_server https://www.ibm.com/cloud/bare-metal-servers/hosting-solutio... https://www.ionos.com/digitalguide/server/know-how/bare-meta... https://www.rackspace.com/library/what-is-a-bare-metal-serve...

snovv_crash · on March 1, 2020

For web, yes. For embedded systems, "bare metal" means running without an OS.

thawaway1837 · on March 1, 2020

That’s both correct and irrelevant because this entire thread is about running Apache, which is clearly about the web, considering Apache is a web server.

idclip · on March 1, 2020

You know whats funny? If be excited to see Apache run embedded on hardware.

This comment is claiming way too many victims. Shame, it'd be a rather exciting experiment!

DSingularity · on March 1, 2020

Bare metal is often used to mean using a CPU instead of a vCPU. But I can see why you would interpret it the way you did.

wpietri · on March 1, 2020

I disagree with the parent that the cloud is a scam. But I also disagree that getting to market faster is opposed to the operational simplicity that he's talking about.

Startups are very likely to fail before they need to scale. So that single-threaded C++ server, or whatever it is that the existing engineers are fastest at creating, is fine to get out there and get early market feedback.

The way I always balance this is to make the product manager put scaling and redundancy stories in the queue with feature stories. We'll build with scaling and redundancy in mind, but we don't do any extra work until the business decides those things are a higher priority than testing new hypotheses. And then I have them buy it in increments, so we can add automated load tests as we go. Then if we were to get into failwhale territory, it'd be a decision we all made together.

pjmlp · on March 1, 2020

Strange, somehow that is exactly what we were doing in 1999, using Apache modules on UNIX.

We had a business in one year and were acquired by a major Portuguese company, with luck the company did survive the crash, and exists nowadays as Altitude Software.

Some of the founders of that startup, left after a couple of years and founded Outsystems.

No cloud, no scripting languages with kubernetes scaling to fix lack of performance, no containers, no internet scale document databases.

Just pure C and C++, with a bit of TCL thrown into the mix, and Apache.

decker · on March 1, 2020

And yet if you were to go out today and build the exact same product in the exact same way it wouldn't succeed. Strange.

pjmlp · on March 1, 2020

Naturally, because it wouldn't fit the fashion driven buzzwords of today's Silicon Valley copy-cats.

LeoNatan25 · on March 1, 2020

Says who? Or is the only metric of success in your book the publishing of buzzword-filled "engineering" blog posts, parroting the endless nonsense coming from similar "start" "ups"?

tedunangst · on March 1, 2020

Where does snap's $3 billion cloud spend land on this spectrum?

dna_polymerase · on March 1, 2020

A lost bet. Their user growth stalled in the past years, they could have moved at least some parts of Google Cloud without losing scalability. On the other hand, it could have gone the other way around and they could have been actually in need of all the scalability Cloud Services offer. From what I have heard they messed up their interface some time after their IPO and that really hit them in the guts.

AmericanChopper · on March 1, 2020

I’d be interested to see the startup that’s spending 80% of its budget on cloud. I’m my experience most of the startup budget goes to payroll, which seems like the sort of thing an anti-large-business person should appreciate.

_Understated_ · on March 1, 2020

> Startups do, and always should, focus on getting to market faster so they can get feedback from real customers.

Absolutely. But are you telling me that a $5 p/m LAMP stack from a basic web hosting company (I'm a Microsoft dev so not sure what the FoM in Linux-land is) isn't worth it as a first step? That you have to use a bunch of bells and whistles from AWS or Azure or something?

And, if you're in the MS camp, surely a $10 p/m IIS/SQL hosted option is much easier than anything the big three have...

I'm not seeing the value proposition of cloud here...

kirstenbirgit · on March 1, 2020

You don't even need C++. Even with Python, Node or PHP, a basic MySQL and Redis installation on a simple Debian server with a little tweaking, you can serve huge amounts of dynamic traffic with a $50 dedicated server from Hetzner.

It'll certainly take less time to manage than Kubernetes, that's fore sure.

mantap · on March 1, 2020

The last thing you want is to architect your backend for a single hetzner server and then find out that it's not enough and have to rewrite everything. If your entire goal as a VC funded startup is to "get big or die" then optimizing for a single server is a waste of resources.

kirstenbirgit · on March 1, 2020

Totally disagree. When you're at the point where a $50 server won't be enough, you still have a long way to scale, just by adding more hardware.

> optimizing for a single server is a waste of resources

No, I think you should spend your time wisely and actually building your product. I don't think you should spend a huge amount of time to "optimize for a single server" (whatever that means), just that a single server can be enough for a large amount of traffic.

Many startups go all-in on complex architecture, microservices and Kubernetes from the get-go (or start splitting their monolith or whatever before it's necessary) and lose a huge amount of time setting all this up (when only Netflix-sized companies really need it) at a time where they should've focused on building their product.

The point here is that you can easily scale most apps in small steps without having to indebt yourself with complex architecture from the get go, which requires having to spend huge sums of cash on sysadmins and AWS bills that don't benefit your users.

holoduke · on March 1, 2020

The c++ example is indeed a bit too much. But setting up a Debian instance with nginx, python, nodejs and a bit of security would probably take max one evening.

For startups I personally believe in the more hybrid solution. I would use a cloud provider and manage the vms myself. For 100 dollars a month you can run 4 vms and a load balancer. Perfectly fine for even most scaleups. Aws, gcs and also azure are money pits. These companies to me resemble the oracle kind of company. The same btw goes for all these tooling companies like hubspot, Salesforce etc. Attractive in the beginning, but bloodsuckers after you become bigger.

acd · on March 1, 2020

Actually you run make, make install and copy the binary to a server. That would be pretty fast and quite simple to operate.

rbanffy · on March 2, 2020

That's pretty much how it's done with Go.

gruez · on March 1, 2020

> Public cloud is a giant scam to make sure 80% of VC money trickles back up to the big three.

This has to be hyperbole right? In any startup labor and rent is the dominant cost, not compute and other SaaS products.

gleenn · on March 1, 2020

Marketing is supposedly a large part of spend too, or should be from what Ive heard. Doesn't matter what your building if no one has ever heard of it, and I've heard some comapnies spend huge sums on marketing alone.

anon102010 · on March 1, 2020

Labor is usually some multiple of rent -> the big cost is labor.

bsder · on March 1, 2020

But but but ... running real servers requires that grouchy dude with scruffy facial hair and mountain boots. And he expects to be paid an absurd amount of money. I avoid these problems if I just put it in the cloud. </sarcasm>

This is just like when they got rid of secretaries. The work didn't go away, it just got moved to all the peons.

> Most of these startups in Soma could run their product on a single threaded C++ server.

Most could be run on a single threaded Python server.

jjeaff · on March 1, 2020

And really it's a false dichotomy. You don't have to build your own data center.

Renting managed, dedicated servers is very inexpensive compared to cloud.

gentleman11 · on March 1, 2020

I’m on a very tight time budget for my multiplayer game. If I had to set up servers myself on bare metal, I would be making a single player game instead just due to time. The cloud is nice for some situations

dominotw · on March 1, 2020

cloud is not just hardware though. Its prebuilt infra that does all the boring parts for you. Yes hardware is one component but i would argue its less significant than software.

ksec · on March 1, 2020

Using the current AMD inflated price, you could get an Intel Gold 5218R 20 for $1273. Even with the official pricing of 7302 closer to $900, you are paying only ~$370 for Intel's 4 More Cores and higher Turbos. Not to mention you still get overall higher IPC on Intel's part.

And once you factor in the price of the whole system with ECC RAM, SSD, Network Adaptor etc ( Which you get discount from Intel's part ). The whole package cost isn't so much in flavour of AMD.

So I dont see what is the real killer here. Intel ( And arguably AMD ) are still selling as many as they could make. The EPYC 2 was announced Mid 2019, and it has been 6+ months since it was launched, and yet Intel is still making record Datacenter revenue. With EPYC making minimal gains on a already dismally small base number. I.e Even If they had 100% increase in shipment from a Base Market Share of 1%, it would still only represent 2% of the market.

With the launch of a competitive Notebook APU, where Notebook represent 70% of today's PC market shipment, the assumption of more EPYC Orders with four HyperScaler, launch of two new console, Ray Tracing GPU, new GPGPU, AMD is forecasting 30% YoY revenue growth. Different people may have different perspective on this number. But I really dont see any "Real Killer" here.

And that is speaking as someone who really wants to see AMD to grow a lot more, but the Data suggest otherwise.

moonbas3 · on March 1, 2020

You need to also take into account that datacenters won't change hardware every cycle release, so Epyc needs more time to gain market share.

If you're doubting the value and performance, you must be blind.

ksec · on March 1, 2020

>You need to also take into account that datacenters won't change hardware every cycle release, so Epyc needs more time to gain market share.

Define Change? EPYC has been sampling to all HyperScaler since early 2019. And Google has only just released an EPYC instance days ago.

>If you're doubting the value and performance, you must be blind.

I already gave a value analysis, including AMD's own projected growth. I will leave others to comment on other aspect of your comment.

lefty2 · on March 1, 2020

Yes, the hyperscalers who received early samples all rolled out EPYC deployments - but that was still a small amount of the market.

The simple fact is the server market moves slowly - server roll outs are planned a year in advance, so you need to be patient.

lefty2 · on March 1, 2020

A single Xeon has only 48 PCIe lanes which normally is not enough. So, you should really be comparing price of a dual Xeon system with the equivalent single socket AMD solution

crazyjncsu · on Feb 29, 2020

> It's pretty crazy how cheap hardware is compared to the cloud these days

By “the cloud” I’m assuming you mean aws/gcp/azure, and comparing them simply with on-prem is a false dichotomy. There are plenty of other cloud and bare metal hosting providers who actually pass on the savings as hardware value improves.

Razengan · on March 1, 2020

> There are plenty of other cloud and bare metal hosting providers who actually pass on the savings as hardware value improves.

It would help them and us if you would list them. :)

freedomben · on March 1, 2020

Not GP, but OVH Cloud is really cheap. Some people have had issues with OVH reliability, but I've only had success. I use AWS for prod, and a mix of Digital Ocean and OVH for non-prod.

crazyjncsu · on March 1, 2020

OVH bare metal is what we use when we need cheap raw horsepower. Mentioning them is usually controversial, which is why I refrained in my parent comment, but we’ve had nothing but success.

snovv_crash · on March 1, 2020

OVH, DigitalOcean, Hetzner (for EU).

Currently renting a dedicated 48-core + HT (so 96 virtual cores), 64GB RAM, 2TB SSD, 10Gb WAN right this moment for ~$500/month as a buildserver.

That would cost 10x on AWS/Azure/GCP.

scottLobster · on March 1, 2020

They could benefit if they looked at infrastructure as a profit center instead of a cost center, and paid/outfitted their IT staff appropriately. But most won't, and AWS/Azure/Gcloud/etc is a way of offshoaring infrastructure to those that do view it as a profit center. In some ways it's positive, as non-technical leadership is coming around to the fact that they don't do technology that well, and can still look modern/ahead of the curve at conferences by using "the cloud".

ThePowerOfFuet · on March 1, 2020

>have already had an epyc shit taken on them

I hope to be able to come up with amazing wordplay like this one day.

bluedino · on March 1, 2020

$1,000 in CPU is nothing when in one server you are spending $32,000 on RAM and even more on disks.

Plus, if I have 16 more cores I'm just going to buy even more of the two, sometimes I can't fit any more in the server so I can't increase my density if I wanted to.

imtringued · on March 1, 2020

I'm not sure why you need 3 Terabytes of RAM in every server. Even if you were to replace every single application on your OS with a JVM based one (including things like bash) you wouldn't meaningfully hit that limit. If you have an application that does indeed need many Terabytes in the same address space then you probably have a very small number of servers. No one runs a 1000 node cluster where each node has 3TB RAM.

SomeHacker44 · on March 1, 2020

Look at some of the supercomputers coming out with even more RAM than your 1000x3TB (3PB) mentioned recently here on HN, as a counterpoint.

thedance · on Feb 29, 2020

Isn’t the intel 6208U a strong competitor to the AMD 7302? At the same price and TDP it has higher clock speeds and a unified memory domain, compared to the AMD 4-way NUMA architecture. It seems like you can make a case for either, depending on your workload.

consp · on Feb 29, 2020

The AMD Rome chips (including the 7302) behave as one numa node I thought (and can find online). You also get quite a lot of PCIe4 as a bonus and a higher all core base frequency. Though your mileage may vary depending on workload as you already stated.

wtallis · on Feb 29, 2020

AMD's latest CPUs are one NUMA node per socket.

Here's the 64-core Threadripper's core to core communication latency: https://pbs.twimg.com/media/EQXru3WU8AAV3JC?format=png&name=...

Communication within a CCX is quicker, but everything else goes through the central IO die that has all the DRAM controllers.

MichaelBurge · on Feb 29, 2020

What is core-to-core communication?

Cache is shared by the cores, but may be temporarily "assigned" to a core that recently wrote to it. Is the latency(x,y) the "# of cycles to reassign to x a cache page owned by y?"?

wtallis · on March 1, 2020

> Cache is shared by the cores,

Not really. All three levels of cache are split on Rome. L1 and L2 are per-core, and L3 is per-CCX (4 cores). If you have 1 thread with a working set larger than the 16MB L3 slice that each CCX gets, then you'll be hitting DRAM rather than spill over into the L3 of another CCX. But if you have cores on separate CCXs that are using the same region of memory, then the usual cache coherency semantics for separate chips applies.

The next version of AMD's Zen architecture is expected to increase the CCX size to 8 cores, so all 32MB of L3 on an 8-core chiplet will be unified and shared between all 8 cores, rather that being partitioned into two 16MB per-CCX chunks. I don't think it's practical for them to unify the L3 cache across multiple chiplets given the performance of their inter-die connections, and I don't think they have the die space on the central IO die for a fully unified L4 cache. (Shrinking the IO die to 7nm may make it possible to have some L4, but probably not enough to really help many workloads.)

rbanffy · on March 2, 2020

> L3 is per-CCX (4 cores). If you have 1 thread with a working set larger than the 16MB L3 slice

Still, 4MB per core is a lot more than the paltry 1.3MB Intel's 9282 offers.

davidmr · on March 1, 2020

That’s an incredibly useful table! Do you know where the data in that table came from?

hrgiger · on March 1, 2020

Might be this one: https://github.com/ajakubek/core-latency

loeg · on March 1, 2020

It's more complicated than that. There are still die-local memory controllers, but the penalty for remote access is vastly lower than earlier Epyc models — so much so that you plausibly could run your workload with naive UMA memory access and be just fine. AMD's ad copy says it's UMA, but really that's just marketing spin on improved remote memory perf.

shaklee3 · on March 1, 2020

Fwiw the latest xeons (Cascade lake) have the option of two numa nodes per socket available in the bios.

thedance · on Feb 29, 2020

You can configure it in different ways in a BIOS but the physical reality remains that it is NUMA and some accesses take longer than others.

wtallis · on Feb 29, 2020

You're either talking about cache latency, or still talking about first-gen EPYC/Threadripper rather than the current generation codenamed Rome. On a cache miss, all chiplets on a single-socket Rome system have roughly equal latency for a DRAM fetch, regardless of which physical address is being fetched. Any differences are insignificant compared to inter-socket memory access or fetching from a different chiplet's DRAM on first-gen EPYC. And even if you wanted to treat each chiplet as a separate NUMA node, 4 isn't the right number for Rome.

therealcamino · on March 1, 2020

"And even if you wanted to treat each chiplet as a separate NUMA node, 4 isn't the right number for Rome."

You can configure Rome systems with 1, 2, or 4 NUMA domains per socket (NPS1, NPS2, or NPS4, where NPS == "NUMA per socket".) Memory bandwidth is higher if you configure as NPS4, but it exposes different latencies to memory based on its location.

It's really impressive that you can get uniform latency to memory for 64 cores on the 7702 chips (when configured as NPS1).

https://www.dell.com/support/article/en-us/sln319015/amd-rom...

wtallis · on March 1, 2020

The underlying hardware reality is that the IO die is organized into quadrants instead of being a full crossbar switch between 8 CCXs and an 8-channel DRAM controller. Whether to enumerate it as 1, 2 or 4 NUMA domains per socket depends very much on what kind of software you plan to run.

Saying that memory bandwidth is higher when configured as NPS4 probably isn't universally true, because that setting will constrain the bandwidth a single core can use to just effectively dual-channel. For a benchmark with the appropriate thread count and sufficiently low core-to-core communication, NPS4 probably makes it easiest to maximize aggregate memory bandwidth utilization (this seems to be what Dell's STREAM Triad results show, with NPS4 and 1 thread per CCX as optimal settings for that benchmark).

therealcamino · on March 1, 2020

I was responding to your claim that "And even if you wanted to treat each chiplet as a separate NUMA node, 4 isn't the right number for Rome", which was incorrect. 4 is one of the three possible options for the number of NUMA domains.

detaro · on March 1, 2020

How does 4 nodes let you treat each of the 8 chiplets as a separate NUMA node?

thedance · on March 1, 2020

Your comments about Rome are completely incorrect. There are four main memory controllers in this architecture and some of them are further from some CCDs than others. In the worst case, accessing the furthest-away controller adds 25ns to main memory latency.

You can put this part in "NPS1" mode which interleaves all channels into an apparently uniform memory region, however it is still the case that 1/4 of memory takes an extra 25ns to access and 1/2 of it takes an extra 10ns, compared to the remainder. Putting the part into NPS1 mode just zeroes out the SRAT tables so the OS isn't aware of the difference.

But don't take it from me. AMD's developer docs clearly state, and I am quoting, "The EPYC 7002 Series processors use a Non-Uniform Memory Access (NUMA) Micro- architecture."

wtallis · on March 1, 2020

> AMD's developer docs clearly state, and I am quoting,

Please quote something that's unambiguously supporting your claims. What you've quoted is insufficient.

What I said about a single-socket Rome processor is not "completely incorrect" under any reasonable interpretation. The latency and bandwidth limitations in moving data from one side of the IO die to another is much smaller than the inter-socket connections that were traditionally implied by NUMA, or the inter-chiplet communication in first-gen EPYC/Threadripper.

If you want to insist that NUMA apply to even the slightest measurable memory performance asymmetry between cores, please say so, so that we may know ahead of time whether the discussion is also going to lead to esoteric details like the ring vs mesh interconnects on Intel's processors.

thedance · on March 1, 2020

If you're not sensitive to main memory latency, just say that. Don't try to tell me that 25ns is not relevant. It's ~100 CPU cycles and it's also about 25% swing from fastest to slowest.

adrian_b · on March 1, 2020

Intel's server/workstation CPUs have had 2 memory controllers during the last several generations, so even if the whole CPU is seen as a single NUMA node by the software, the actual memory access latency has always been different from core to core, depending on the core position on the intercommunication mesh or ring.

So what ???

The initial posting was about the CPU being seen as a single or multiple NUMA node by the software, not about having an equal memory access latency for all cores, which has never been true for any server/workstation CPU, from any vendor, since many, many years ago.

barkingcat · on March 1, 2020

You are referring to the last generation chips ...

rubyn00bie · on Feb 29, 2020

It would be for me, but it's a single socket only processor. I like the 7302 specifically for the non-P variant. If I was going to stick to just one socket, I'd probably spend a bit more and go with the entry level Threadripper 3960x...

It's a nice looking processor though and probably the only one worth a damn in that line up.

matja · on Feb 29, 2020

I haven't seen a single review of the 6208U/6209U/6210U or anywhere that has them in stock, they might as well not exist.

wtallis · on Feb 29, 2020

Launch-day reviews are pretty uncommon for server processors, especially mid-cycle refreshes that don't introduce any fundamentally new tech. And retail stock the same week as the announcement is also not how this market segment usually works.

fortran77 · on March 1, 2020

AMD doesn't support 4 way, nor do they support (well) ECC. Intel still has a huge advantage for serious/critical use cases.

astrodust · on March 1, 2020

It doesn't because it's not necessary when you can get 128 physical cores with two sockets. Do you really need 256 cores on a single board? If you do, wait a year or so and there will be 128-core packages available.

They also support ECC. What's not "well" about EYPC?

thedance · on March 1, 2020

Also sort of interested in this comment. It can be difficult to make ECC useful. There's chipkill vs SECDED, for starters. On paper, EPYC Rome has chipkill. More important than paper features is integration with the board firmware and the OS kernel ... Linux RAS features are quite useless if the kernel fails to notice the errors. Whether this stuff is well-integrated depends a lot on your vendors.

wumpus · on March 1, 2020

An occasional 1 bit correction is very common compared to chipkill, so there is a huge benefit to ECC without chipkill. In fact, with 1000s of servers, I've never had chipkill give me any benefit. I guess I'm too small to see the effect from chipkill. But yes, I do see 1-bit corrections.

thedance · on March 1, 2020

Yeah, not advocating for chipkill, but the OS has to know how to interpret the machine check syndromes, is all I was getting at. This has been a problem for me on Skylake-SP with Linux, to name one.

wumpus · on March 1, 2020

I've always had to go out of my way to find single-bit correction numbers in Linux. I suspect that once you find that, noticing a chipkill event is pretty easy. But I've never seen a chipkill event, despite having a lot of DRAM for a long time.