Tabby Community - How can I restrict the offloading of some layers to the GPU to decrease memory usage in DeepseekCoder-6.7B?

To restrict the offloading of some layers to the GPU in DeepseekCoder-6.7B and decrease memory usage, you can modify the model to use a variant with fewer bits. Here's a general approach to changing the variations of the model:

Identify the model configuration file or script where the model architecture is defined.
Look for options related to model precision or bit variants.
Modify the configuration to use a variant with fewer bits, such as a 4-bit variant.
Save the changes and recompile or reload the model with the updated configuration.

By adjusting the model's precision or bit variants, you can reduce the memory footprint on the GPU and potentially address the memory allocation issues during execution.