llama.cpp as a "first-class citizen" on day-0 of new model releases #22652

MarkErik · 2026-05-03T16:55:20Z

MarkErik
May 3, 2026

Often, when a new model is released, the supported methods of running them are often:

vllm
sglang
ktransformers

Rarely is llama.cpp part of the recommended ways to run the model.

It would be awesome if the organizations would also ensure that their model works great served via GGUF on llama.cpp. I appreciate that this will also necessitate liaising and working with the companies to ensure that new features and templates are supported.

Otherwise what we're left with is lots of post-release fixing via the community. Or guessing how to enable the features the model supports.

I'm thinking of some of the recent discussions related to Qwen models where people were trying to figure out what is the right syntax to enable or disable preserved thinking, or reasoning.

Myself, I find that I have to refer to the original model card, Unsloth's model info, GitHub issues, PRs, to make sure that I am running a new model optimally. (or in some cases, to have re-download models multiple times as fixes for GGUFs are made).

What is the appetite of the core llama.cpp team to be able to support such an initiative? I am sure between the community we must be able to bring together connections and contacts at the model developing organizations so that llama.cpp is a core of their release priority.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp as a "first-class citizen" on day-0 of new model releases #22652

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

llama.cpp as a "first-class citizen" on day-0 of new model releases #22652

Uh oh!

MarkErik May 3, 2026

Replies: 0 comments

MarkErik
May 3, 2026