Skip to content

Turn WildClawBench to a Skill , Service + Ollama & others #4

@fire17

Description

@fire17

Hi there really awesome
A few things in mind

Turn this to a skill, that can check current state of benchmark online, or even run and publish this benchmark agentically, seamlessly

Keep updating the benchmark as time progresses and create an agent skill that can check https://internlm.github.io/WildClawBench/ or preferred domain

To be able to be always on top of best agentic models to use

Humans are agents can ask to get latest information and possibly update to newer models as they are released

One very big thing that bench is missing is a local models section like Ollama! & fine-tune models from huggingface that are small but are pre-trained on big models with agenetic behavior and other popular model pipelines

+ Easy/Auto publish results from forks so it's easy to crowdsource results from everybody was running The benchmark on their unique machines (Apple metal Nvidia etc )

Keep track of of benchmarks as states, and over time see the graph change

Make predictions when local model will be able to reach 50% like current opus 4.6 does currently

Should be easy as a pie for claw to do

Last me know what you think
Thanks a lot and all the best

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions