Multiply the number of A100's you need as necessary.
Here, you don't really need the ram. If you could accept fewer tokens/second, you could do it much cheaper with consumer graphics cards.
Even with A100, the sweet-spot in batching is not going to give you 1k/process/second. Of course, you could go up to H100...
Multiply the number of A100's you need as necessary.
Here, you don't really need the ram. If you could accept fewer tokens/second, you could do it much cheaper with consumer graphics cards.
Even with A100, the sweet-spot in batching is not going to give you 1k/process/second. Of course, you could go up to H100...