general

Should we make http-binding a first-class citizen in Tabby server?

The current Tabby server heavily relies on ctranslate2 for CUDA serving, but there are concerns about the lack of continuous support for ctranslate2. Other platforms like vllm/tensorrt-llm/deepspeed have made great progress. Should we prioritize making http-binding as a first-class citizen in Tabby server?

Le

Lei Wen

Asked on Nov 06, 2023

The concern about the lack of continuous support for ctranslate2 in Tabby server is no longer relevant. The 0.5.0 release of Tabby server addresses this issue. Additionally, Tabby has previously served more than 100 teams with ctranslate2, and the scalability of the server is a tradeoff between computing resources, latency, and monitoring. If you have a case for serving more than 100 teams with Tabby, feel free to reach out for further discussion.

Nov 06, 2023Edited by