The hard part about using any AI Chips other than NVIDIA has been software.
ROCm is finally at the point where it can train and deploy LLMs like Llama 2 in production.
If you want to try this out, one big issue is that software support is hugely different on Instinct vs Radeon. I think AMD will fix this eventually, but today you need to use Instinct.
We will post more information explaining how this works in the next few weeks.
The middle section of this blog post includes some details including GEMM/memcpy performance, and some of the software layers that we needed to write to run on AMD.
We can run on any machine that can run docker in a dev mode. It won't be very fast for very big models, but you can test all of the functionality.
Many customers start by allocating a cloud node, e.g. on Azure/AWS, we do an install onto it, and they develop applications on top of it.
Then for scale or to run larger models, we provision more powerful AMD GPU servers. We can host them (no lead time) or ship them to any datacenter (typical 4 week assembly/shipping).
Put a load balancer in front of it and it will scale to as many GPUs as you can get.
For training we build on SLURM. We containerized SLURM, so you just need to provision GPU servers, launch the SLURMd containers on training nodes. SLURM scales to 10,000s of servers.
We typically launch SLURMd containers on bare metal, but some of our customers manage them with kubernetes/etc
ROCm is finally at the point where it can train and deploy LLMs like Llama 2 in production.
If you want to try this out, one big issue is that software support is hugely different on Instinct vs Radeon. I think AMD will fix this eventually, but today you need to use Instinct.
We will post more information explaining how this works in the next few weeks.
The middle section of this blog post includes some details including GEMM/memcpy performance, and some of the software layers that we needed to write to run on AMD.
https://www.lamini.ai/blog/lamini-amd-paving-the-road-to-gpu...