I'm sort of amazed they haven't dropped prices more; I guess they can still charge a premium to anyone who wants their AVX512 (? I think that's it) performance to be as high as it can be. Otherwise, most of these processors have already had an epyc shit taken on them.
I was just again pricing processors and if that 7302 isn't the real killer in the Epyc lineup I dunno what is. It craps on pretty much anything and everything Intel has, and is only about $1100 right now (pricing is inflated, but meh).
It's pretty crazy how cheap hardware is compared to the cloud these days; sure, it's not in a data center or if it is that's a headache of its own... but it is really fucking cheap. I personally think a lot of mid-sized orgs could benefit from moving a lot of their non-production environments to an on-prem server. The VPN is probably already locking people out, causing connection headaches anyway, so it's not like you're gonna have less connectivity than you did ;-)
Public cloud is a giant scam to make sure 80% of VC money trickles back up to the big three. Most of these startups in Soma could run their product on a single threaded C++ server. But that's not cool and trendy. You're supposed to pretend that you could suddenly become Google overnight, so you need The Cloud(TM), microservices, and all kinds of redundant garbage.
Great. Show me how fast you go to market with a web application and mobile backend written as a single threaded C++ application. Also please tell me how long it took to secure and set up and maintain the server.
Startups do, and always should, focus on getting to market faster so they can get feedback from real customers.
I will take going to market faster over saving a few dollars on server costs everytime I'm given the choice.
Mature products' and companies' use cases are more nuanced.
>Show me how fast you go to market with a web application and mobile backend written as a single threaded C++ application. Also please tell me how long it took to secure and set up and maintain the server.
Let's not go crazy, Apache on metal is a very simple setup and at least as secure as S3 out of the box. Platform spread and simplicity are just lost concepts, that's all.
I'm not sure it'd even be possible to run Apache on bare metal; it has too many dependencies on OS facilities.
I'm curious if people are running bare metal web servers. I'd think there's enough lookup, modules etc that it wouldn't be worth it (small embedded applications like IoT frobs excepted)
> OVHcloud Best Value is a great way for you to experience the advantages of bare metal over virtual servers at an unbeatable price. Our Best Value bare metal servers feature the most stable environment, making it perfect for processing large volumes of data.
The terminology is "bare metal" when buying up dedicated servers these days. Some places still use the term "dedicated", but since the advent of "the cloud", the term "bare metal" has come up, since most instances are ASSUMED to be VPS.
That doesn’t even make sense. Wouldn’t that be “non-virtualized”?
“Bare metal” has meant right on the bare hardware (silicon) since the 1960s. I don’t think I’ve ever heard anyone use that expression in any other way.
Not really a cloud developer, so I looked up a couple of links and I haven't been able to find any uses of the term other than referring to a single-tenant machine that you have complete control over, as opposed to virtualized solutions.
That’s both correct and irrelevant because this entire thread is about running Apache, which is clearly about the web, considering Apache is a web server.
I disagree with the parent that the cloud is a scam. But I also disagree that getting to market faster is opposed to the operational simplicity that he's talking about.
Startups are very likely to fail before they need to scale. So that single-threaded C++ server, or whatever it is that the existing engineers are fastest at creating, is fine to get out there and get early market feedback.
The way I always balance this is to make the product manager put scaling and redundancy stories in the queue with feature stories. We'll build with scaling and redundancy in mind, but we don't do any extra work until the business decides those things are a higher priority than testing new hypotheses. And then I have them buy it in increments, so we can add automated load tests as we go. Then if we were to get into failwhale territory, it'd be a decision we all made together.
Strange, somehow that is exactly what we were doing in 1999, using Apache modules on UNIX.
We had a business in one year and were acquired by a major Portuguese company, with luck the company did survive the crash, and exists nowadays as Altitude Software.
Some of the founders of that startup, left after a couple of years and founded Outsystems.
No cloud, no scripting languages with kubernetes scaling to fix lack of performance, no containers, no internet scale document databases.
Just pure C and C++, with a bit of TCL thrown into the mix, and Apache.
Says who? Or is the only metric of success in your book the publishing of buzzword-filled "engineering" blog posts, parroting the endless nonsense coming from similar "start" "ups"?
A lost bet. Their user growth stalled in the past years, they could have moved at least some parts of Google Cloud without losing scalability.
On the other hand, it could have gone the other way around and they could have been actually in need of all the scalability Cloud Services offer. From what I have heard they messed up their interface some time after their IPO and that really hit them in the guts.
I’d be interested to see the startup that’s spending 80% of its budget on cloud. I’m my experience most of the startup budget goes to payroll, which seems like the sort of thing an anti-large-business person should appreciate.
> Startups do, and always should, focus on getting to market faster so they can get feedback from real customers.
Absolutely. But are you telling me that a $5 p/m LAMP stack from a basic web hosting company (I'm a Microsoft dev so not sure what the FoM in Linux-land is) isn't worth it as a first step? That you have to use a bunch of bells and whistles from AWS or Azure or something?
And, if you're in the MS camp, surely a $10 p/m IIS/SQL hosted option is much easier than anything the big three have...
I'm not seeing the value proposition of cloud here...
You don't even need C++. Even with Python, Node or PHP, a basic MySQL and Redis installation on a simple Debian server with a little tweaking, you can serve huge amounts of dynamic traffic with a $50 dedicated server from Hetzner.
It'll certainly take less time to manage than Kubernetes, that's fore sure.
The last thing you want is to architect your backend for a single hetzner server and then find out that it's not enough and have to rewrite everything. If your entire goal as a VC funded startup is to "get big or die" then optimizing for a single server is a waste of resources.
Totally disagree. When you're at the point where a $50 server won't be enough, you still have a long way to scale, just by adding more hardware.
> optimizing for a single server is a waste of resources
No, I think you should spend your time wisely and actually building your product. I don't think you should spend a huge amount of time to "optimize for a single server" (whatever that means), just that a single server can be enough for a large amount of traffic.
Many startups go all-in on complex architecture, microservices and Kubernetes from the get-go (or start splitting their monolith or whatever before it's necessary) and lose a huge amount of time setting all this up (when only Netflix-sized companies really need it) at a time where they should've focused on building their product.
The point here is that you can easily scale most apps in small steps without having to indebt yourself with complex architecture from the get go, which requires having to spend huge sums of cash on sysadmins and AWS bills that don't benefit your users.
The c++ example is indeed a bit too much.
But setting up a Debian instance with nginx, python, nodejs and a bit of security would probably take max one evening.
For startups I personally believe in the more hybrid solution. I would use a cloud provider and manage the vms myself. For 100 dollars a month you can run 4 vms and a load balancer. Perfectly fine for even most scaleups.
Aws, gcs and also azure are money pits. These companies to me resemble the oracle kind of company. The same btw goes for all these tooling companies like hubspot, Salesforce etc. Attractive in the beginning, but bloodsuckers after you become bigger.
Marketing is supposedly a large part of spend too, or should be from what Ive heard. Doesn't matter what your building if no one has ever heard of it, and I've heard some comapnies spend huge sums on marketing alone.
But but but ... running real servers requires that grouchy dude with scruffy facial hair and mountain boots. And he expects to be paid an absurd amount of money. I avoid these problems if I just put it in the cloud. </sarcasm>
This is just like when they got rid of secretaries. The work didn't go away, it just got moved to all the peons.
> Most of these startups in Soma could run their product on a single threaded C++ server.
Most could be run on a single threaded Python server.
I’m on a very tight time budget for my multiplayer game. If I had to set up servers myself on bare metal, I would be making a single player game instead just due to time. The cloud is nice for some situations
cloud is not just hardware though. Its prebuilt infra that does all the boring parts for you. Yes hardware is one component but i would argue its less significant than software.
Using the current AMD inflated price, you could get an Intel Gold 5218R 20 for $1273. Even with the official pricing of 7302 closer to $900, you are paying only ~$370 for Intel's 4 More Cores and higher Turbos. Not to mention you still get overall higher IPC on Intel's part.
And once you factor in the price of the whole system with ECC RAM, SSD, Network Adaptor etc ( Which you get discount from Intel's part ). The whole package cost isn't so much in flavour of AMD.
So I dont see what is the real killer here. Intel ( And arguably AMD ) are still selling as many as they could make. The EPYC 2 was announced Mid 2019, and it has been 6+ months since it was launched, and yet Intel is still making record Datacenter revenue. With EPYC making minimal gains on a already dismally small base number. I.e Even If they had 100% increase in shipment from a Base Market Share of 1%, it would still only represent 2% of the market.
With the launch of a competitive Notebook APU, where Notebook represent 70% of today's PC market shipment, the assumption of more EPYC Orders with four HyperScaler, launch of two new console, Ray Tracing GPU, new GPGPU, AMD is forecasting 30% YoY revenue growth. Different people may have different perspective on this number. But I really dont see any "Real Killer" here.
And that is speaking as someone who really wants to see AMD to grow a lot more, but the Data suggest otherwise.
A single Xeon has only 48 PCIe lanes which normally is not enough. So, you should really be comparing price of a dual Xeon system with the equivalent single socket AMD solution
> It's pretty crazy how cheap hardware is compared to the cloud these days
By “the cloud” I’m assuming you mean aws/gcp/azure, and comparing them simply with on-prem is a false dichotomy. There are plenty of other cloud and bare metal hosting providers who actually pass on the savings as hardware value improves.
Not GP, but OVH Cloud is really cheap. Some people have had issues with OVH reliability, but I've only had success. I use AWS for prod, and a mix of Digital Ocean and OVH for non-prod.
OVH bare metal is what we use when we need cheap raw horsepower. Mentioning them is usually controversial, which is why I refrained in my parent comment, but we’ve had nothing but success.
They could benefit if they looked at infrastructure as a profit center instead of a cost center, and paid/outfitted their IT staff appropriately. But most won't, and AWS/Azure/Gcloud/etc is a way of offshoaring infrastructure to those that do view it as a profit center. In some ways it's positive, as non-technical leadership is coming around to the fact that they don't do technology that well, and can still look modern/ahead of the curve at conferences by using "the cloud".
$1,000 in CPU is nothing when in one server you are spending $32,000 on RAM and even more on disks.
Plus, if I have 16 more cores I'm just going to buy even more of the two, sometimes I can't fit any more in the server so I can't increase my density if I wanted to.
I'm not sure why you need 3 Terabytes of RAM in every server. Even if you were to replace every single application on your OS with a JVM based one (including things like bash) you wouldn't meaningfully hit that limit. If you have an application that does indeed need many Terabytes in the same address space then you probably have a very small number of servers. No one runs a 1000 node cluster where each node has 3TB RAM.
Isn’t the intel 6208U a strong competitor to the AMD 7302? At the same price and TDP it has higher clock speeds and a unified memory domain, compared to the AMD 4-way NUMA architecture. It seems like you can make a case for either, depending on your workload.
The AMD Rome chips (including the 7302) behave as one numa node I thought (and can find online). You also get quite a lot of PCIe4 as a bonus and a higher all core base frequency. Though your mileage may vary depending on workload as you already stated.
Cache is shared by the cores, but may be temporarily "assigned" to a core that recently wrote to it. Is the latency(x,y) the "# of cycles to reassign to x a cache page owned by y?"?
Not really. All three levels of cache are split on Rome. L1 and L2 are per-core, and L3 is per-CCX (4 cores). If you have 1 thread with a working set larger than the 16MB L3 slice that each CCX gets, then you'll be hitting DRAM rather than spill over into the L3 of another CCX. But if you have cores on separate CCXs that are using the same region of memory, then the usual cache coherency semantics for separate chips applies.
The next version of AMD's Zen architecture is expected to increase the CCX size to 8 cores, so all 32MB of L3 on an 8-core chiplet will be unified and shared between all 8 cores, rather that being partitioned into two 16MB per-CCX chunks. I don't think it's practical for them to unify the L3 cache across multiple chiplets given the performance of their inter-die connections, and I don't think they have the die space on the central IO die for a fully unified L4 cache. (Shrinking the IO die to 7nm may make it possible to have some L4, but probably not enough to really help many workloads.)
It's more complicated than that. There are still die-local memory controllers, but the penalty for remote access is vastly lower than earlier Epyc models — so much so that you plausibly could run your workload with naive UMA memory access and be just fine. AMD's ad copy says it's UMA, but really that's just marketing spin on improved remote memory perf.
You're either talking about cache latency, or still talking about first-gen EPYC/Threadripper rather than the current generation codenamed Rome. On a cache miss, all chiplets on a single-socket Rome system have roughly equal latency for a DRAM fetch, regardless of which physical address is being fetched. Any differences are insignificant compared to inter-socket memory access or fetching from a different chiplet's DRAM on first-gen EPYC. And even if you wanted to treat each chiplet as a separate NUMA node, 4 isn't the right number for Rome.
"And even if you wanted to treat each chiplet as a separate NUMA node, 4 isn't the right number for Rome."
You can configure Rome systems with 1, 2, or 4 NUMA domains per socket (NPS1, NPS2, or NPS4, where NPS == "NUMA per socket".) Memory bandwidth is higher if you configure as NPS4, but it exposes different latencies to memory based on its location.
It's really impressive that you can get uniform latency to memory for 64 cores on the 7702 chips (when configured as NPS1).
The underlying hardware reality is that the IO die is organized into quadrants instead of being a full crossbar switch between 8 CCXs and an 8-channel DRAM controller. Whether to enumerate it as 1, 2 or 4 NUMA domains per socket depends very much on what kind of software you plan to run.
Saying that memory bandwidth is higher when configured as NPS4 probably isn't universally true, because that setting will constrain the bandwidth a single core can use to just effectively dual-channel. For a benchmark with the appropriate thread count and sufficiently low core-to-core communication, NPS4 probably makes it easiest to maximize aggregate memory bandwidth utilization (this seems to be what Dell's STREAM Triad results show, with NPS4 and 1 thread per CCX as optimal settings for that benchmark).
I was responding to your claim that "And even if you wanted to treat each chiplet as a separate NUMA node, 4 isn't the right number for Rome", which was incorrect. 4 is one of the three possible options for the number of NUMA domains.
Your comments about Rome are completely incorrect. There are four main memory controllers in this architecture and some of them are further from some CCDs than others. In the worst case, accessing the furthest-away controller adds 25ns to main memory latency.
You can put this part in "NPS1" mode which interleaves all channels into an apparently uniform memory region, however it is still the case that 1/4 of memory takes an extra 25ns to access and 1/2 of it takes an extra 10ns, compared to the remainder. Putting the part into NPS1 mode just zeroes out the SRAT tables so the OS isn't aware of the difference.
But don't take it from me. AMD's developer docs clearly state, and I am quoting, "The EPYC 7002 Series processors use a Non-Uniform Memory Access (NUMA) Micro- architecture."
> AMD's developer docs clearly state, and I am quoting,
Please quote something that's unambiguously supporting your claims. What you've quoted is insufficient.
What I said about a single-socket Rome processor is not "completely incorrect" under any reasonable interpretation. The latency and bandwidth limitations in moving data from one side of the IO die to another is much smaller than the inter-socket connections that were traditionally implied by NUMA, or the inter-chiplet communication in first-gen EPYC/Threadripper.
If you want to insist that NUMA apply to even the slightest measurable memory performance asymmetry between cores, please say so, so that we may know ahead of time whether the discussion is also going to lead to esoteric details like the ring vs mesh interconnects on Intel's processors.
If you're not sensitive to main memory latency, just say that. Don't try to tell me that 25ns is not relevant. It's ~100 CPU cycles and it's also about 25% swing from fastest to slowest.
Intel's server/workstation CPUs have had 2 memory controllers during the last several generations, so even if the whole CPU is seen as a single NUMA node by the software, the actual memory access latency has always been different from core to core, depending on the core position on the intercommunication mesh or ring.
So what ???
The initial posting was about the CPU being seen as a single or multiple NUMA node by the software, not about having an equal memory access latency for all cores, which has never been true for any server/workstation CPU, from any vendor, since many, many years ago.
It would be for me, but it's a single socket only processor. I like the 7302 specifically for the non-P variant. If I was going to stick to just one socket, I'd probably spend a bit more and go with the entry level Threadripper 3960x...
It's a nice looking processor though and probably the only one worth a damn in that line up.
Launch-day reviews are pretty uncommon for server processors, especially mid-cycle refreshes that don't introduce any fundamentally new tech. And retail stock the same week as the announcement is also not how this market segment usually works.
It doesn't because it's not necessary when you can get 128 physical cores with two sockets. Do you really need 256 cores on a single board? If you do, wait a year or so and there will be 128-core packages available.
They also support ECC. What's not "well" about EYPC?
Also sort of interested in this comment. It can be difficult to make ECC useful. There's chipkill vs SECDED, for starters. On paper, EPYC Rome has chipkill. More important than paper features is integration with the board firmware and the OS kernel ... Linux RAS features are quite useless if the kernel fails to notice the errors. Whether this stuff is well-integrated depends a lot on your vendors.
An occasional 1 bit correction is very common compared to chipkill, so there is a huge benefit to ECC without chipkill. In fact, with 1000s of servers, I've never had chipkill give me any benefit. I guess I'm too small to see the effect from chipkill. But yes, I do see 1-bit corrections.
Yeah, not advocating for chipkill, but the OS has to know how to interpret the machine check syndromes, is all I was getting at. This has been a problem for me on Skylake-SP with Linux, to name one.
I've always had to go out of my way to find single-bit correction numbers in Linux. I suspect that once you find that, noticing a chipkill event is pretty easy. But I've never seen a chipkill event, despite having a lot of DRAM for a long time.
I was just again pricing processors and if that 7302 isn't the real killer in the Epyc lineup I dunno what is. It craps on pretty much anything and everything Intel has, and is only about $1100 right now (pricing is inflated, but meh).
It's pretty crazy how cheap hardware is compared to the cloud these days; sure, it's not in a data center or if it is that's a headache of its own... but it is really fucking cheap. I personally think a lot of mid-sized orgs could benefit from moving a lot of their non-production environments to an on-prem server. The VPN is probably already locking people out, causing connection headaches anyway, so it's not like you're gonna have less connectivity than you did ;-)