Add continuous-eval observe ref for Foundry agent Monitoring#1733
Add continuous-eval observe ref for Foundry agent Monitoring#1733jugonzales wants to merge 5 commits intomicrosoft:mainfrom
Conversation
aa1cdb5 to
be05bf7
Compare
- New continuous-eval skill doc with entry points, behavioral rules, operations, and response format - Register continuous-eval in SKILL.md sub-skills table and lifecycle table - Add continuous-eval cross-reference to observe.md Related Skills - Trim SKILL.md description to fit 1024 char limit - Update trigger snapshots for new keywords Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
be05bf7 to
474a51d
Compare
There was a problem hiding this comment.
Pull request overview
Adds documentation and routing keywords for Foundry “continuous evaluation” monitoring, integrating it into the existing observe workflow and updating trigger keyword snapshots accordingly.
Changes:
- Added a new continuous evaluation reference doc covering
continuous_eval_create/get/deleteand how to act on monitoring results. - Updated
observestep-6 monitoring guidance and expandedobserveentry points/keywords to include production monitoring scenarios. - Bumped
microsoft-foundryskill version and refreshed trigger keyword snapshots to reflect the updated description/keywords.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| plugin/skills/microsoft-foundry/SKILL.md | Updates skill description/keywords, bumps version, and clarifies observe lifecycle entry for continuous monitoring. |
| plugin/skills/microsoft-foundry/foundry-agent/observe/observe.md | Adds continuous monitoring intents, tools, and entry point to the observe skill. |
| plugin/skills/microsoft-foundry/foundry-agent/observe/references/cicd-monitoring.md | Refocuses Step 6 on CI/CD eval gates plus continuous production monitoring and links to continuous eval reference. |
| plugin/skills/microsoft-foundry/foundry-agent/observe/references/continuous-eval.md | New reference describing continuous evaluation configuration, operations, and remediation loop. |
| tests/microsoft-foundry/snapshots/triggers.test.ts.snap | Snapshot update for changed skill description keywords (continuous/enable/disable, etc.). |
| tests/microsoft-foundry/resource/create/snapshots/triggers.test.ts.snap | Snapshot update for changed skill description keywords. |
| tests/microsoft-foundry/models/deploy/deploy-model/snapshots/triggers.test.ts.snap | Snapshot update for changed skill description keywords. |
| tests/microsoft-foundry/models/deploy/deploy-model-optimal-region/snapshots/triggers.test.ts.snap | Snapshot update for changed skill description keywords. |
| tests/microsoft-foundry/models/deploy/customize-deployment/snapshots/triggers.test.ts.snap | Snapshot update for changed skill description keywords. |
| tests/microsoft-foundry/models/deploy/capacity/snapshots/triggers.test.ts.snap | Snapshot update for changed skill description keywords. |
| tests/microsoft-foundry/foundry-agent/create/snapshots/triggers.test.ts.snap | Snapshot update for changed skill description keywords. |
| tests/microsoft-foundry/foundry-agent/deploy/snapshots/triggers.test.ts.snap | Snapshot update for changed skill description keywords. |
| tests/microsoft-foundry/foundry-agent/invoke/snapshots/triggers.test.ts.snap | Snapshot update for changed skill description keywords. |
| tests/microsoft-foundry/foundry-agent/observe/snapshots/triggers.test.ts.snap | Snapshot update for changed skill description keywords. |
| tests/microsoft-foundry/foundry-agent/trace/snapshots/triggers.test.ts.snap | Snapshot update for changed skill description keywords. |
| tests/microsoft-foundry/foundry-agent/troubleshoot/snapshots/triggers.test.ts.snap | Snapshot update for changed skill description keywords. |
| tests/microsoft-foundry/foundry-agent/eval-datasets/snapshots/triggers.test.ts.snap | Snapshot update for changed skill description keywords. |
Comments suppressed due to low confidence (1)
plugin/skills/microsoft-foundry/SKILL.md:28
- PR description says a new doc was added at
foundry-agent/continuous-eval/continuous-eval.md, but the change actually addsfoundry-agent/observe/references/continuous-eval.mdand no newfoundry-agent/continuous-eval/sub-skill entry appears in the Sub-Skills table. Either update the PR description to match, or add/register the intended standalone sub-skill to avoid confusion for maintainers.
| Sub-Skill | When to Use | Reference |
|-----------|-------------|-----------|
| **deploy** | Containerize, build, push to ACR, create/update/start/stop/clone agent deployments | [deploy](foundry-agent/deploy/deploy.md) |
| **invoke** | Send messages to an agent, single or multi-turn conversations | [invoke](foundry-agent/invoke/invoke.md) |
| **observe** | Evaluate agent quality, run batch evals, analyze failures, optimize prompts, improve agent instructions, compare versions, set up CI/CD monitoring, and enable continuous production evaluation | [observe](foundry-agent/observe/observe.md) |
| **trace** | Query traces, analyze latency/failures, correlate eval results to specific responses via App Insights `customEvents` | [trace](foundry-agent/trace/trace.md) |
| **troubleshoot** | View container logs, query telemetry, diagnose failures | [troubleshoot](foundry-agent/troubleshoot/troubleshoot.md) |
| **create** | Create new hosted agent applications. Supports Microsoft Agent Framework, LangGraph, or custom frameworks in Python or C#. Downloads starter samples from foundry-samples repo. | [create](foundry-agent/create/create.md) |
| **eval-datasets** | Harvest production traces into evaluation datasets, manage dataset versions and splits, track evaluation metrics over time, detect regressions, and maintain full lineage from trace to deployment. Use for: create dataset from traces, dataset versioning, evaluation trending, regression detection, dataset comparison, eval lineage. | [eval-datasets](foundry-agent/eval-datasets/eval-datasets.md) |
plugin/skills/microsoft-foundry/foundry-agent/observe/references/continuous-eval.md
Outdated
Show resolved
Hide resolved
plugin/skills/microsoft-foundry/foundry-agent/observe/references/continuous-eval.md
Show resolved
Hide resolved
plugin/skills/microsoft-foundry/foundry-agent/observe/observe.md
Outdated
Show resolved
Hide resolved
plugin/skills/microsoft-foundry/foundry-agent/observe/references/cicd-monitoring.md
Outdated
Show resolved
Hide resolved
jongio
left a comment
There was a problem hiding this comment.
The new continuous-eval.md is well-structured - clear entry points, behavioral rules, and operations with proper cross-references to trace/deploy/observe. Separating pre-deploy (CI/CD pipeline) from post-deploy (continuous monitoring) in cicd-monitoring.md is a solid design choice. Version bump and snapshot updates look correct.
Two things to address before merging:
-
cicd-monitoring.md duplicates roughly 70% of continuous-eval.md's remediation content (score reading, triage steps, routing table, verification). The two docs already have small wording drift between them. Since cicd-monitoring.md already links to continuous-eval.md for setup, it should also defer to it for the "acting on results" workflow instead of duplicating it inline.
-
observe.md Quick Reference lists continuous_eval_create and continuous_eval_get but not continuous_eval_delete. The linked continuous-eval.md documents delete as a full operation with its own entry point - the parent should surface it too.
| 3. **Enable** — call `continuous_eval_create` with the selected evaluators. The tool auto-detects agent kind and configures the appropriate backend (real-time for prompt agents, scheduled for hosted agents). | ||
| 4. **Confirm** — present the returned configuration to the user. | ||
|
|
||
| ### Acting on Monitoring Results |
There was a problem hiding this comment.
This entire section (reading scores, triage, remediation routing, verification) is nearly identical to continuous-eval.md's "Acting on Results" - already with small wording drift (e.g., "and timestamps" here but not there, "Route To" vs "Action" column names). Since you're already linking to continuous-eval.md for the setup workflow, consider replacing this with:
For how to read evaluation scores, triage regressions, and verify fixes, see Acting on Results.
That keeps continuous-eval.md as the single source of truth for the remediation loop.
| |----------|-------| | ||
| | MCP server | `azure` | | ||
| | Key MCP tools | `evaluator_catalog_get`, `evaluation_agent_batch_eval_create`, `evaluator_catalog_create`, `evaluation_comparison_create`, `prompt_optimize`, `agent_update` | | ||
| | Key MCP tools | `evaluator_catalog_get`, `evaluation_agent_batch_eval_create`, `evaluator_catalog_create`, `evaluation_comparison_create`, `prompt_optimize`, `agent_update`, `continuous_eval_create`, `continuous_eval_get` | |
There was a problem hiding this comment.
continuous_eval_delete is missing from this list but is documented as a full operation in the linked continuous-eval.md reference (with its own entry point for "Delete continuous eval"). Consider adding it here for consistency.
plugin/skills/microsoft-foundry/foundry-agent/observe/references/cicd-monitoring.md
Show resolved
Hide resolved
jongio
left a comment
There was a problem hiding this comment.
The new continuous-eval.md is solid - entry points, behavioral rules, and operations are well-structured with proper cross-references. Four additional items beyond what's already flagged:
- The Disable operation likely destroys evaluator config if the tool upserts (see inline comment).
evaluation_getisn't in the Quick Reference but is used in the remediation workflow.- The observe.md
DO NOT manually callguardrail doesn't covercontinuous_eval_create. - Evaluator examples differ across three files with no explanation of why.
plugin/skills/microsoft-foundry/foundry-agent/observe/references/continuous-eval.md
Show resolved
Hide resolved
plugin/skills/microsoft-foundry/foundry-agent/observe/references/continuous-eval.md
Outdated
Show resolved
Hide resolved
plugin/skills/microsoft-foundry/foundry-agent/observe/observe.md
Outdated
Show resolved
Hide resolved
plugin/skills/microsoft-foundry/foundry-agent/observe/references/continuous-eval.md
Outdated
Show resolved
Hide resolved
jongio
left a comment
There was a problem hiding this comment.
Three additional items beyond what's already flagged:
-
SKILL.md description dropped the "prompt optimizer workflows" keyword that was in the previous version. The new wording has "prompt optimizer" but not "prompt optimizer workflows" - this could regress routing for that phrase. Either restore it or confirm the removal was intentional.
-
The
scenarioparameter in continuous-eval.md's optional parameters table listsstandardandbusinessvalues but doesn't explain what each mode does or when to choose one. An agent can't make a useful recommendation without this context. -
continuous_eval_getreturns a list (per the Response Format section), but the Disable and Delete workflows assume a single config. If multiple configs exist, there's no guidance on which to target - worth a note on expected cardinality or how to disambiguate.
Description
Adds a new continuous-eval sub-skill that documents the continuous_eval_create, continuous_eval_get, and
continuous_eval_delete MCP tools. These tools enable ongoing evaluation of agent responses — auto-detecting
agent kind and routing to the appropriate backend (evaluation rules for prompt/workflow agents, scheduled
evaluations for hosted agents).
Changes:
operations, response format, and evaluator guidance
- SKILL.md — registered continuous-eval sub-skill in the sub-skills table and lifecycle table
- observe/observe.md — added cross-reference to continuous-eval in Related Skills
- Updates snapshots to handle keywords
Checklist
cd tests && npm test)npm run test:skills:integration -- <skill>)USE FOR/DO NOT USE FOR/PREFER OVERclauses: confirmed no routing regressions for competing skills