Is increasing the --parallelism flag helpful?
Sam Tenenholz
Asked on Dec 19, 2023
Increasing the --parallelism flag can potentially increase the throughput of tabby, but it will not have an impact on reducing latency for single-user setups. To reduce latency, you can try using a smaller model size or using a better device such as a GPU.