I noticed that different tokenizers are used in the rag and I'm wondering why. Also, are the empty strings in the token list meaningful for searching tantivy?
Ryan Y
Asked on Jan 29, 2024
The different tokenizers are used in the rag to handle different types of input data. For example, one tokenizer may be used for text data while another tokenizer may be used for numeric data. As for the empty strings in the token list, they are not meaningful for searching tantivy and can be removed for easier debugging experience.