Turn WildClawBench to a Skill , Service + Ollama & others

Hi there really awesome
A few things in mind

### Turn this to a skill, that can **check current state of benchmark online**, or even run and publish this benchmark agentically, seamlessly

### Keep updating the benchmark as time progresses and create an agent skill that can check https://internlm.github.io/WildClawBench/ or preferred domain
To be able to be always on top of best agentic models to use

Humans are agents can ask to get latest information and possibly update to newer models as they are released 

### **One very big thing that bench is missing is a local models section like Ollama!** & fine-tune models from huggingface that are small but are pre-trained on big models with agenetic behavior and other popular model pipelines

### + Easy/Auto publish results from forks so it's easy to crowdsource results from everybody was running The benchmark on their unique machines (Apple metal Nvidia etc )

Keep track of of benchmarks as states, and over time see the graph change 

Make predictions when local model will be able to reach 50% like current opus 4.6 does currently


Should be easy as a pie for claw to do


Last me know what you think
Thanks a lot and all the best

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turn WildClawBench to a Skill , Service + Ollama & others #4

Turn this to a skill, that can check current state of benchmark online, or even run and publish this benchmark agentically, seamlessly

Keep updating the benchmark as time progresses and create an agent skill that can check https://internlm.github.io/WildClawBench/ or preferred domain

One very big thing that bench is missing is a local models section like Ollama! & fine-tune models from huggingface that are small but are pre-trained on big models with agenetic behavior and other popular model pipelines

+ Easy/Auto publish results from forks so it's easy to crowdsource results from everybody was running The benchmark on their unique machines (Apple metal Nvidia etc )

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Turn WildClawBench to a Skill , Service + Ollama & others #4

Description

Turn this to a skill, that can check current state of benchmark online, or even run and publish this benchmark agentically, seamlessly

Keep updating the benchmark as time progresses and create an agent skill that can check https://internlm.github.io/WildClawBench/ or preferred domain

One very big thing that bench is missing is a local models section like Ollama! & fine-tune models from huggingface that are small but are pre-trained on big models with agenetic behavior and other popular model pipelines

+ Easy/Auto publish results from forks so it's easy to crowdsource results from everybody was running The benchmark on their unique machines (Apple metal Nvidia etc )

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions