general

Is there a benchmark available to see expected load support on different GPUs for the TabbyML server?

I loaded the 7B model in our K8s cluster for a POC and we are running it on a NC6s v3 instance with a V100 GPU. I want to know if there is a benchmark available to see the expected load support on different GPUs for the TabbyML server. This will help us model our cost if we self-host the server.

An

Antoine Lemieux

Asked on Feb 09, 2024

We don't have the numbers for the V100, but you can find a benchmark that provides coverage of a wide variety of models, GPUs, and throughputs at this link: benchmark.

Feb 09, 2024Edited by