PSA: Symmetric KV cache quantization enables the fast fused Flash Attention path on AMD HIP #22411

Krillian8 · 2026-04-26T23:36:41Z

Krillian8
Apr 26, 2026

If you're running llama.cpp with HIP/ROCm on AMD GPUs and using Flash Attention with quantized KV cache, check whether your K and V cache types match.

-ctk q4_0 -ctv q4_0 (symmetric) → fused FA kernel
-ctk q4_0 -ctv f16 (asymmetric) → non fused fallback

The fused path is significantly faster for token generation. The non fused fallback exists for correctness when K/V types differ but it's not optimized the same way.

This is in the source. The fused FA kernels only support matching K/V quantization types. If they don't match, it silently falls back to the slower non fused implementation. No warning, no log message. You just get worse performance and don't know why.

Tested on RX 7900 XTX (gfx1100) with b8642, models ranging from 14B to 27B dense. Symmetric Q4_0 KV consistently used the fast path. Asymmetric Q4_0/F16 did not.

If you've been running -ctk q4_0 -ctv f16 thinking the F16 values give better quality at the cost of some VRAM, the quality tradeoff might be worth it for your use case, but know that you're also paying a speed penalty from missing the fused kernel.

Q8_0/Q8_0 also works as a symmetric config if you want higher KV precision without losing the fused path.

Relevant source: ggml-cuda/fattn*.cu, check the type matching guards.

am17an · 2026-04-27T05:14:04Z

am17an
Apr 27, 2026
Collaborator

You ca compile with GGML_CUDA_FA_ALL_QUANTS=1, however note that it is much slower to compile

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PSA: Symmetric KV cache quantization enables the fast fused Flash Attention path on AMD HIP #22411

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

PSA: Symmetric KV cache quantization enables the fast fused Flash Attention path on AMD HIP #22411

Uh oh!

Uh oh!

Krillian8 Apr 26, 2026

Replies: 1 comment

Uh oh!

am17an Apr 27, 2026 Collaborator

Krillian8
Apr 26, 2026

am17an
Apr 27, 2026
Collaborator