Tabby Community - Is there any reason to choose q8 models as the default instead ones with q4/q5?

Choosing q8 models as the default over q4/q5 models may be due to a trade-off between model accuracy and performance.
q8 models provide higher accuracy compared to q4/q5 models but require more memory and computational resources.
q4/q5 models are more aggressively quantized, resulting in lower memory usage and potentially faster inference speed, but at the cost of reduced model accuracy.
The decision to choose q8 models as the default may be based on the target use case where higher accuracy is prioritized over memory efficiency or inference speed.
Experimenting with q4/q5 models can be beneficial for scenarios where memory constraints or inference speed are critical, as these models offer a balance between resource efficiency and acceptable accuracy levels.