general

Is using multiple GPUs / sharding for a single model a possibility now? Or hosting multiple models behind the same endpoint for increased performance?

Si

Simon Linnebjerg

Asked on Aug 30, 2023

We have not yet implemented support for tensor parallelism (multi-GPU). This is something we will likely address in the future, but it's not currently a high priority for us. We are currently focusing on smaller models, such as the ~3 billion parameter model, for completion use cases.

Aug 30, 2023Edited by