It would be very cool if the performance improvements from https://github.com/ggerganov/llama.cpp/pull/613 could be backported to this repo. I couldn't find an issue for this, if there is one, I'm happy to close this.
It would be very cool if the performance improvements from ggml-org/llama.cpp#613 could be backported to this repo.
I couldn't find an issue for this, if there is one, I'm happy to close this.