Skip to content

feat: add a post-task self-evolution loop based on real task outcomes #221

@huangrichao2020

Description

@huangrichao2020

Scenario

This is a must-have.

A long-running agent should not only execute tasks; it should improve its own working method based on real successes and failures. For example, after one task it may learn that a certain API must be dry-run first, a certain restart command can kill itself, or a class of tools should always be preceded by a status check. If this experience is not retained, the agent will step into the same pit again next time.

Current Pain Points

  • There is no systematic post-task reflection mechanism.
  • Failures, detours, and user corrections do not enter reusable skills.
  • Skill updates depend on the user manually reminding the agent.
  • Actual task outcomes are not fed back into methodology or workflow updates.

Suggested Direction

Add a self-evolution loop, but with clear boundaries:

  • Run a lightweight reflection after task completion: is there reusable experience from this task?
  • Update skills only when conditions are met: complex task, repeated trial-and-error, explicit user correction, or a stable workflow discovered.
  • Skill updates should be versioned patches, not silent edits to core runtime code.
  • Automatically record trigger reason, applicable scope, counterexamples/limits, and validation method.
  • High-risk experience should require user confirmation before being written.

One safer direction is to limit self-evolution to low-risk knowledge layers such as skills/, memory/self_evolution_sop.md, and task journals, rather than letting the agent freely edit runtime code. This allows evolution without making the system fragile.

Acceptance Criteria

  • After a complex task, the agent can propose or generate a candidate skill update.
  • The skill update includes scenario, steps, verification, and caveats.
  • Problems corrected by the user can be recalled in similar future tasks.
  • Self-evolution actions have logs and reversible file diffs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions