general
Is there a benchmark available to see expected load support on different GPUs for the TabbyML server?
I loaded the 7B model in our K8s cluster for a POC and we are running it on a NC6s v3 instance with a V100 GPU. I want to know if there is a benchmark available to see the expected load support on different GPUs for the TabbyML server. This will help us model our cost if we self-host the server.
An
Antoine Lemieux
Asked on Feb 09, 2024
We don't have the numbers for the V100, but you can find a benchmark that provides coverage of a wide variety of models, GPUs, and throughputs at this link: benchmark.
Feb 09, 2024Edited by