general

Is there a way to make the model predict one line at a time to reduce latency?

I notice that the model tries to predict too much at once, resulting in high latency. Can I modify the code to make it predict one line at a time?

Sa

Sam Tenenholz

Asked on Dec 19, 2023

Yes, you can modify the code to make the model predict one line at a time. You can add a stop condition using the newline character '\n'. Additionally, you may need to check if '\n' is the first generated character and handle it differently. Here is an example of how you can modify the code:

# Add a stop condition for predicting one line at a time
stop_condition = False
while not stop_condition:
    # Generate the next character
    next_char = model.predict(current_input)
    # Check if '\n' is generated
    if next_char == '\n' and len(current_output) > 0:
        stop_condition = True
    else:
        # Append the generated character to the output
        current_output += next_char
        # Update the input for the next prediction
        current_input = update_input(current_input, next_char)
Dec 19, 2023Edited by