Tabby Community - Is using multiple GPUs / sharding for a single model a possibility now? Or hosting multiple models behind the same endpoint for increased performance?

We have not yet implemented support for tensor parallelism (multi-GPU). This is something we will likely address in the future, but it's not currently a high priority for us. We are currently focusing on smaller models, such as the ~3 billion parameter model, for completion use cases.