You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Often, when a new model is released, the supported methods of running them are often:
vllm
sglang
ktransformers
Rarely is llama.cpp part of the recommended ways to run the model.
It would be awesome if the organizations would also ensure that their model works great served via GGUF on llama.cpp. I appreciate that this will also necessitate liaising and working with the companies to ensure that new features and templates are supported.
Otherwise what we're left with is lots of post-release fixing via the community. Or guessing how to enable the features the model supports.
I'm thinking of some of the recent discussions related to Qwen models where people were trying to figure out what is the right syntax to enable or disable preserved thinking, or reasoning.
Myself, I find that I have to refer to the original model card, Unsloth's model info, GitHub issues, PRs, to make sure that I am running a new model optimally. (or in some cases, to have re-download models multiple times as fixes for GGUFs are made).
What is the appetite of the core llama.cpp team to be able to support such an initiative? I am sure between the community we must be able to bring together connections and contacts at the model developing organizations so that llama.cpp is a core of their release priority.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Often, when a new model is released, the supported methods of running them are often:
Rarely is llama.cpp part of the recommended ways to run the model.
It would be awesome if the organizations would also ensure that their model works great served via GGUF on llama.cpp. I appreciate that this will also necessitate liaising and working with the companies to ensure that new features and templates are supported.
Otherwise what we're left with is lots of post-release fixing via the community. Or guessing how to enable the features the model supports.
I'm thinking of some of the recent discussions related to Qwen models where people were trying to figure out what is the right syntax to enable or disable preserved thinking, or reasoning.
Myself, I find that I have to refer to the original model card, Unsloth's model info, GitHub issues, PRs, to make sure that I am running a new model optimally. (or in some cases, to have re-download models multiple times as fixes for GGUFs are made).
What is the appetite of the core llama.cpp team to be able to support such an initiative? I am sure between the community we must be able to bring together connections and contacts at the model developing organizations so that llama.cpp is a core of their release priority.
Beta Was this translation helpful? Give feedback.
All reactions