You don’t need NVlink for that to work.

sva_ · on Oct 15, 2022

Apologies for my ignorance, but don't you need NVLink to pool the memory of the cards together?

moyix · on Oct 15, 2022

Nope. The way it works is FasterTransformer splits the model across the two GPUs and runs both halves in parallel. It periodically has to sync the results from each half, so it will go faster if you have a high-bandwidth link between the GPUs like NVLink, but it will work just fine even if they have to communicate over PCIe peer-to-peer or even communicating via the CPU.