Tabby Community - Is there a way to make the model predict one line at a time to reduce latency?

Yes, you can modify the code to make the model predict one line at a time. You can add a stop condition using the newline character '\n'. Additionally, you may need to check if '\n' is the first generated character and handle it differently. Here is an example of how you can modify the code:

# Add a stop condition for predicting one line at a time
stop_condition = False
while not stop_condition:
    # Generate the next character
    next_char = model.predict(current_input)
    # Check if '\n' is generated
    if next_char == '\n' and len(current_output) > 0:
        stop_condition = True
    else:
        # Append the generated character to the output
        current_output += next_char
        # Update the input for the next prediction
        current_input = update_input(current_input, next_char)