I'm using the Qwen2-72B-instruct model in Tabby, and I've encountered an issue where, after outputting a segment of normal content, the model ends up outputting a large amount of meaningless "66666" content. This doesn't happen in the official Hugging Face playground. I'm not sure if the problem is with the model itself, llama.cpp, or Tabby. I've tried upgrading llama.cpp
, adjusting presence_penalty
, trying different quantizations of the model, and adding the -fa
startup parameter, but none resolved the issue. How can I locate and fix this bug?
moqi
Asked on Jun 24, 2024
You should be able to locate the events causing this issue in ~/.tabby/events
. Share them as a gist so we can help take a look. Additionally, try sending the prompt to llama-server
directly using the /v1/chat/completions
interface to see if it generates the correct output, and compare it to the Tabby chat output. Check the llama-server documentation for reference on how to call it directly.