Is there any reason to choose q8 models as the default instead ones with q4/q5?

RiQuY is asking about the default choice of q8 models over q4/q5 models and the potential reasons behind it in the context of running large models on GPUs with limited RAM.



Asked on Apr 10, 2024

  • Choosing q8 models as the default over q4/q5 models may be due to a trade-off between model accuracy and performance.

  • q8 models provide higher accuracy compared to q4/q5 models but require more memory and computational resources.

  • q4/q5 models are more aggressively quantized, resulting in lower memory usage and potentially faster inference speed, but at the cost of reduced model accuracy.

  • The decision to choose q8 models as the default may be based on the target use case where higher accuracy is prioritized over memory efficiency or inference speed.

  • Experimenting with q4/q5 models can be beneficial for scenarios where memory constraints or inference speed are critical, as these models offer a balance between resource efficiency and acceptable accuracy levels.

Apr 10, 2024Edited by