RiQuY is asking about the default choice of q8 models over q4/q5 models and the potential reasons behind it in the context of running large models on GPUs with limited RAM.
RiQuY
Asked on Apr 10, 2024
Choosing q8 models as the default over q4/q5 models may be due to a trade-off between model accuracy and performance.
q8 models provide higher accuracy compared to q4/q5 models but require more memory and computational resources.
q4/q5 models are more aggressively quantized, resulting in lower memory usage and potentially faster inference speed, but at the cost of reduced model accuracy.
The decision to choose q8 models as the default may be based on the target use case where higher accuracy is prioritized over memory efficiency or inference speed.
Experimenting with q4/q5 models can be beneficial for scenarios where memory constraints or inference speed are critical, as these models offer a balance between resource efficiency and acceptable accuracy levels.