AI startup Lamini bets future on AMD's Instinct GPUs

gdiamos · on Sept 26, 2023

The hard part about using any AI Chips other than NVIDIA has been software.

ROCm is finally at the point where it can train and deploy LLMs like Llama 2 in production.

If you want to try this out, one big issue is that software support is hugely different on Instinct vs Radeon. I think AMD will fix this eventually, but today you need to use Instinct.

We will post more information explaining how this works in the next few weeks.

The middle section of this blog post includes some details including GEMM/memcpy performance, and some of the software layers that we needed to write to run on AMD.

https://www.lamini.ai/blog/lamini-amd-paving-the-road-to-gpu...

azmodeus · on Sept 26, 2023

Very exciting work. Is there a way to try lamini with a cloud provider before investing too much in it?

gdiamos · on Sept 26, 2023

Hosted here (on our own AMD cluster): https://www.lamini.ai

Docs here: https://lamini-ai.github.io

We can run on any machine that can run docker in a dev mode. It won't be very fast for very big models, but you can test all of the functionality.

Many customers start by allocating a cloud node, e.g. on Azure/AWS, we do an install onto it, and they develop applications on top of it.

Then for scale or to run larger models, we provision more powerful AMD GPU servers. We can host them (no lead time) or ship them to any datacenter (typical 4 week assembly/shipping).

htrp · on Sept 26, 2023

> "more than a 100 AMD GPUs in production all year" and could be scaled up to "thousands of MI GPUs."

gdiamos · on Sept 26, 2023

We built a new horizontally scalable LLM inference server that is optimized for AMD. API docs: https://lamini-ai.github.io/API/completions/

Put a load balancer in front of it and it will scale to as many GPUs as you can get.

For training we build on SLURM. We containerized SLURM, so you just need to provision GPU servers, launch the SLURMd containers on training nodes. SLURM scales to 10,000s of servers.

We typically launch SLURMd containers on bare metal, but some of our customers manage them with kubernetes/etc

shreyagupta06 · on Sept 26, 2023

damn!