Mllama kv scale fix by gshtras · Pull Request #335 · ROCm/vllm

gshtras · 2024-12-18T16:41:34Z

mllama.py calls the caching function directly, so using tensors instead of floats there too

…entation

* Using tensors in the explicit cache function calls from mllama implementation * Properly creating the tensor Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

gshtras added 2 commits December 18, 2024 16:36

Using tensors in the explicit cache function calls from mllama implem…

216e382

…entation

Properly creating the tensor

04e6424

shajrawi approved these changes Dec 18, 2024

View reviewed changes

gshtras merged commit fa1ff83 into main Dec 18, 2024

gshtras deleted the mllama_kv_Scale_fix branch December 18, 2024 16:53

gshtras added a commit that referenced this pull request Jan 7, 2025

Mllama kv scale fix (#335)

ef181a9

* Using tensors in the explicit cache function calls from mllama implementation * Properly creating the tensor Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mllama kv scale fix#335

Mllama kv scale fix#335
gshtras merged 2 commits intomainfrom
mllama_kv_Scale_fix

gshtras commented Dec 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gshtras commented Dec 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants