Scenario
This is a must-have.
A long-running agent should not only execute tasks; it should improve its own working method based on real successes and failures. For example, after one task it may learn that a certain API must be dry-run first, a certain restart command can kill itself, or a class of tools should always be preceded by a status check. If this experience is not retained, the agent will step into the same pit again next time.
Current Pain Points
- There is no systematic post-task reflection mechanism.
- Failures, detours, and user corrections do not enter reusable skills.
- Skill updates depend on the user manually reminding the agent.
- Actual task outcomes are not fed back into methodology or workflow updates.
Suggested Direction
Add a self-evolution loop, but with clear boundaries:
- Run a lightweight reflection after task completion: is there reusable experience from this task?
- Update skills only when conditions are met: complex task, repeated trial-and-error, explicit user correction, or a stable workflow discovered.
- Skill updates should be versioned patches, not silent edits to core runtime code.
- Automatically record trigger reason, applicable scope, counterexamples/limits, and validation method.
- High-risk experience should require user confirmation before being written.
One safer direction is to limit self-evolution to low-risk knowledge layers such as skills/, memory/self_evolution_sop.md, and task journals, rather than letting the agent freely edit runtime code. This allows evolution without making the system fragile.
Acceptance Criteria
- After a complex task, the agent can propose or generate a candidate skill update.
- The skill update includes scenario, steps, verification, and caveats.
- Problems corrected by the user can be recalled in similar future tasks.
- Self-evolution actions have logs and reversible file diffs.
Scenario
This is a must-have.
A long-running agent should not only execute tasks; it should improve its own working method based on real successes and failures. For example, after one task it may learn that a certain API must be dry-run first, a certain restart command can kill itself, or a class of tools should always be preceded by a status check. If this experience is not retained, the agent will step into the same pit again next time.
Current Pain Points
Suggested Direction
Add a self-evolution loop, but with clear boundaries:
One safer direction is to limit self-evolution to low-risk knowledge layers such as
skills/,memory/self_evolution_sop.md, and task journals, rather than letting the agent freely edit runtime code. This allows evolution without making the system fragile.Acceptance Criteria