How to diagnose the issue of a model outputting repeated meaningless content in Tabby?

I'm using the Qwen2-72B-instruct model in Tabby, and I've encountered an issue where, after outputting a segment of normal content, the model ends up outputting a large amount of meaningless "66666" content. This doesn't happen in the official Hugging Face playground. I'm not sure if the problem is with the model itself, llama.cpp, or Tabby. I've tried upgrading llama.cpp, adjusting presence_penalty, trying different quantizations of the model, and adding the -fa startup parameter, but none resolved the issue. How can I locate and fix this bug?



Asked on Jun 24, 2024

You should be able to locate the events causing this issue in ~/.tabby/events. Share them as a gist so we can help take a look. Additionally, try sending the prompt to llama-server directly using the /v1/chat/completions interface to see if it generates the correct output, and compare it to the Tabby chat output. Check the llama-server documentation for reference on how to call it directly.

Jun 26, 2024Edited by