-
Notifications
You must be signed in to change notification settings - Fork 17
Turn WildClawBench to a Skill , Service + Ollama & others #4
Copy link
Copy link
Open
Description
Hi there really awesome
A few things in mind
Turn this to a skill, that can check current state of benchmark online, or even run and publish this benchmark agentically, seamlessly
Keep updating the benchmark as time progresses and create an agent skill that can check https://internlm.github.io/WildClawBench/ or preferred domain
To be able to be always on top of best agentic models to use
Humans are agents can ask to get latest information and possibly update to newer models as they are released
One very big thing that bench is missing is a local models section like Ollama! & fine-tune models from huggingface that are small but are pre-trained on big models with agenetic behavior and other popular model pipelines
+ Easy/Auto publish results from forks so it's easy to crowdsource results from everybody was running The benchmark on their unique machines (Apple metal Nvidia etc )
Keep track of of benchmarks as states, and over time see the graph change
Make predictions when local model will be able to reach 50% like current opus 4.6 does currently
Should be easy as a pie for claw to do
Last me know what you think
Thanks a lot and all the best
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels