Nope. The way it works is FasterTransformer splits the model across the two GPUs and runs both halves in parallel. It periodically has to sync the results from each half, so it will go faster if you have a high-bandwidth link between the GPUs like NVLink, but it will work just fine even if they have to communicate over PCIe peer-to-peer or even communicating via the CPU.