general

How can I run models at half precision with --compute-type deprecated?

I see that --compute-type was deprecated, and I'm looking for a way to run models at half precision.

K

K

Asked on Apr 30, 2024

  • Check the MODEL_SPEC.md file in the TabbyML repository
  • Look for the option to put quantized gguf in the model directory
Apr 30, 2024Edited by