general

How can I restrict the offloading of some layers to the GPU to decrease memory usage in DeepseekCoder-6.7B?

Ali Sayyah is facing memory issues while running DeepseekCoder-6.7B due to GPU memory allocation. Meng Zhang suggested using a model with less bits, like its 4-bit variant. Ali is now looking for guidance on how to change the variations of the model.

Al

Ali Sayyah

Asked on Mar 29, 2024

To restrict the offloading of some layers to the GPU in DeepseekCoder-6.7B and decrease memory usage, you can modify the model to use a variant with fewer bits. Here's a general approach to changing the variations of the model:

  1. Identify the model configuration file or script where the model architecture is defined.
  2. Look for options related to model precision or bit variants.
  3. Modify the configuration to use a variant with fewer bits, such as a 4-bit variant.
  4. Save the changes and recompile or reload the model with the updated configuration.

By adjusting the model's precision or bit variants, you can reduce the memory footprint on the GPU and potentially address the memory allocation issues during execution.

Mar 29, 2024Edited by