Repurpose Scheduler Spec Dec metric for testing correctness#1
Repurpose Scheduler Spec Dec metric for testing correctness#1ekagra-ranjan wants to merge 2 commits intomainfrom
Conversation
| num_draft_tokens = scheduler_stats.spec_decoding_stats.num_draft_tokens | ||
| num_accepted_tokens = scheduler_stats.spec_decoding_stats.num_accepted_tokens | ||
| num_spec_proposal = num_draft_tokens / args.num_spec_tokens | ||
| mean_accepted_tokens = 1 + num_accepted_tokens / num_spec_proposal |
There was a problem hiding this comment.
num_spec_proposal is the num of times the SD call was made
mean_accepted_tokens = (sum of generated tokens over num_spec_proposal) / num_spec_proposal
= (num_spec_proposal + sum of accepted tokens over num_spec_proposal) / num_spec_proposal
= 1 + num_accepted_tokens / num_spec_proposal
| # spec_decoding_stats: Optional[SpecDecodingStats] = None | ||
| spec_decoding_stats = self.spec_decoding_stats |
There was a problem hiding this comment.
cache the spec_decoding_stats so that it keeps a running metric instead of reinit it every engine step
| model_dir = "meta-llama/Meta-Llama-3-8B-Instruct" | ||
| eagle_dir = "abhigoyal/EAGLE-LLaMA3-Instruct-8B-vllm" | ||
| # eagle_dir = "yuhuili/EAGLE-LLaMA3-Instruct-8B" | ||
| eagle_dir = "lmsys/sglang-EAGLE-LLaMA3-Instruct-8B" |
There was a problem hiding this comment.
using sglang model so that the prev SGL bench is comparable: https://docs.google.com/document/d/18ETJLsnxR88Qq3VDk5Mq-Hb7vuE9o3VNZ-hhz-OqAXk/edit?usp=sharing
Hmm, we discussed this on Slack shortly after you submitted this PR a new SpecDecodingStats should only be created once per Your response, for reference:
|
|
Hi @markmc - yup, we are good. I am still using this hacky PR whenever I want to quickly find the AL for my evals since vllm-project#16367 is still not merged |
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
|
This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you! |
I was looking into SD metrics in V1 and find that
spec_decoding_statsis reinit every time we do an engine step and we use an observe function which from the name is supposed to aggregate over miltiple observe calls. However, since it's reinit everytime, we will always have 1 observe call and there is no aggregation.To enable AL computation for checking correctness, this PR aggregates the metrics across steps in the
EngineCoreOutputs.scheduler_stats