Add HuggingFace's Tokenizer#271
Conversation
philpax
left a comment
There was a problem hiding this comment.
Looks good! Just a few nitpicks around error handling/naming
danforbes
left a comment
There was a problem hiding this comment.
Outstanding work 💪🏻 Unbelievable first contribution 🚀
|
I've updated this PR and made a few code quality changes, but the no-space issue means it can't be made the default option. I've asked for help with Tokenizers to determine why that's happening. |
|
We just need to implement this huggingface/tokenizers#1141 (comment) and we should be able to take this across the line. In addition to this, we should update to the latest non-HF tokenizer from llama.cpp (seeing as we're supporting both), but that can be done in a separate PR. |
|
Great work, @RedBoxing, and thanks for all of the help testing, everyone! Glad to have this in :) |
Closes #35 and #212