Add continuous-eval observe ref for Foundry agent Monitoring by jugonzales · Pull Request #1733 · microsoft/GitHub-Copilot-for-Azure

jugonzales · 2026-04-06T18:42:01Z

Description

Adds a new continuous-eval sub-skill that documents the continuous_eval_create, continuous_eval_get, and
continuous_eval_delete MCP tools. These tools enable ongoing evaluation of agent responses — auto-detecting
agent kind and routing to the appropriate backend (evaluation rules for prompt/workflow agents, scheduled
evaluations for hosted agents).

Changes:

- New foundry-agent/continuous-eval/continuous-eval.md — skill doc with entry points, behavioral rules,

operations, response format, and evaluator guidance
- SKILL.md — registered continuous-eval sub-skill in the sub-skills table and lifecycle table
- observe/observe.md — added cross-reference to continuous-eval in Related Skills
- Updates snapshots to handle keywords

Checklist

Tests pass locally (cd tests && npm test)
If modifying skill descriptions: verified routing correctness with integration tests (npm run test:skills:integration -- <skill>)
If modifying skill USE FOR / DO NOT USE FOR / PREFER OVER clauses: confirmed no routing regressions for competing skills
Version bumped in skill frontmatter (if skill files changed)

- New continuous-eval skill doc with entry points, behavioral rules, operations, and response format - Register continuous-eval in SKILL.md sub-skills table and lifecycle table - Add continuous-eval cross-reference to observe.md Related Skills - Trim SKILL.md description to fit 1024 char limit - Update trigger snapshots for new keywords Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds documentation and routing keywords for Foundry “continuous evaluation” monitoring, integrating it into the existing observe workflow and updating trigger keyword snapshots accordingly.

Changes:

Added a new continuous evaluation reference doc covering continuous_eval_create/get/delete and how to act on monitoring results.
Updated observe step-6 monitoring guidance and expanded observe entry points/keywords to include production monitoring scenarios.
Bumped microsoft-foundry skill version and refreshed trigger keyword snapshots to reflect the updated description/keywords.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
plugin/skills/microsoft-foundry/SKILL.md	Updates skill description/keywords, bumps version, and clarifies observe lifecycle entry for continuous monitoring.
plugin/skills/microsoft-foundry/foundry-agent/observe/observe.md	Adds continuous monitoring intents, tools, and entry point to the observe skill.
plugin/skills/microsoft-foundry/foundry-agent/observe/references/cicd-monitoring.md	Refocuses Step 6 on CI/CD eval gates plus continuous production monitoring and links to continuous eval reference.
plugin/skills/microsoft-foundry/foundry-agent/observe/references/continuous-eval.md	New reference describing continuous evaluation configuration, operations, and remediation loop.
tests/microsoft-foundry/snapshots/triggers.test.ts.snap	Snapshot update for changed skill description keywords (continuous/enable/disable, etc.).
tests/microsoft-foundry/resource/create/snapshots/triggers.test.ts.snap	Snapshot update for changed skill description keywords.
tests/microsoft-foundry/models/deploy/deploy-model/snapshots/triggers.test.ts.snap	Snapshot update for changed skill description keywords.
tests/microsoft-foundry/models/deploy/deploy-model-optimal-region/snapshots/triggers.test.ts.snap	Snapshot update for changed skill description keywords.
tests/microsoft-foundry/models/deploy/customize-deployment/snapshots/triggers.test.ts.snap	Snapshot update for changed skill description keywords.
tests/microsoft-foundry/models/deploy/capacity/snapshots/triggers.test.ts.snap	Snapshot update for changed skill description keywords.
tests/microsoft-foundry/foundry-agent/create/snapshots/triggers.test.ts.snap	Snapshot update for changed skill description keywords.
tests/microsoft-foundry/foundry-agent/deploy/snapshots/triggers.test.ts.snap	Snapshot update for changed skill description keywords.
tests/microsoft-foundry/foundry-agent/invoke/snapshots/triggers.test.ts.snap	Snapshot update for changed skill description keywords.
tests/microsoft-foundry/foundry-agent/observe/snapshots/triggers.test.ts.snap	Snapshot update for changed skill description keywords.
tests/microsoft-foundry/foundry-agent/trace/snapshots/triggers.test.ts.snap	Snapshot update for changed skill description keywords.
tests/microsoft-foundry/foundry-agent/troubleshoot/snapshots/triggers.test.ts.snap	Snapshot update for changed skill description keywords.
tests/microsoft-foundry/foundry-agent/eval-datasets/snapshots/triggers.test.ts.snap	Snapshot update for changed skill description keywords.

Comments suppressed due to low confidence (1)

plugin/skills/microsoft-foundry/SKILL.md:28

PR description says a new doc was added at foundry-agent/continuous-eval/continuous-eval.md, but the change actually adds foundry-agent/observe/references/continuous-eval.md and no new foundry-agent/continuous-eval/ sub-skill entry appears in the Sub-Skills table. Either update the PR description to match, or add/register the intended standalone sub-skill to avoid confusion for maintainers.

| Sub-Skill | When to Use | Reference |
|-----------|-------------|-----------|
| **deploy** | Containerize, build, push to ACR, create/update/start/stop/clone agent deployments | [deploy](foundry-agent/deploy/deploy.md) |
| **invoke** | Send messages to an agent, single or multi-turn conversations | [invoke](foundry-agent/invoke/invoke.md) |
| **observe** | Evaluate agent quality, run batch evals, analyze failures, optimize prompts, improve agent instructions, compare versions, set up CI/CD monitoring, and enable continuous production evaluation | [observe](foundry-agent/observe/observe.md) |
| **trace** | Query traces, analyze latency/failures, correlate eval results to specific responses via App Insights `customEvents` | [trace](foundry-agent/trace/trace.md) |
| **troubleshoot** | View container logs, query telemetry, diagnose failures | [troubleshoot](foundry-agent/troubleshoot/troubleshoot.md) |
| **create** | Create new hosted agent applications. Supports Microsoft Agent Framework, LangGraph, or custom frameworks in Python or C#. Downloads starter samples from foundry-samples repo. | [create](foundry-agent/create/create.md) |
| **eval-datasets** | Harvest production traces into evaluation datasets, manage dataset versions and splits, track evaluation metrics over time, detect regressions, and maintain full lineage from trace to deployment. Use for: create dataset from traces, dataset versioning, evaluation trending, regression detection, dataset comparison, eval lineage. | [eval-datasets](foundry-agent/eval-datasets/eval-datasets.md) |

plugin/skills/microsoft-foundry/foundry-agent/observe/references/continuous-eval.md

plugin/skills/microsoft-foundry/foundry-agent/observe/observe.md

plugin/skills/microsoft-foundry/foundry-agent/observe/references/cicd-monitoring.md

jongio

The new continuous-eval.md is well-structured - clear entry points, behavioral rules, and operations with proper cross-references to trace/deploy/observe. Separating pre-deploy (CI/CD pipeline) from post-deploy (continuous monitoring) in cicd-monitoring.md is a solid design choice. Version bump and snapshot updates look correct.

Two things to address before merging:

cicd-monitoring.md duplicates roughly 70% of continuous-eval.md's remediation content (score reading, triage steps, routing table, verification). The two docs already have small wording drift between them. Since cicd-monitoring.md already links to continuous-eval.md for setup, it should also defer to it for the "acting on results" workflow instead of duplicating it inline.
observe.md Quick Reference lists continuous_eval_create and continuous_eval_get but not continuous_eval_delete. The linked continuous-eval.md documents delete as a full operation with its own entry point - the parent should surface it too.

jongio · 2026-04-09T22:15:27Z

plugin/skills/microsoft-foundry/foundry-agent/observe/references/cicd-monitoring.md

+3. **Enable** — call `continuous_eval_create` with the selected evaluators. The tool auto-detects agent kind and configures the appropriate backend (real-time for prompt agents, scheduled for hosted agents).
+4. **Confirm** — present the returned configuration to the user.
+
+### Acting on Monitoring Results


This entire section (reading scores, triage, remediation routing, verification) is nearly identical to continuous-eval.md's "Acting on Results" - already with small wording drift (e.g., "and timestamps" here but not there, "Route To" vs "Action" column names). Since you're already linking to continuous-eval.md for the setup workflow, consider replacing this with:
For how to read evaluation scores, triage regressions, and verify fixes, see Acting on Results.
That keeps continuous-eval.md as the single source of truth for the remediation loop.

jongio · 2026-04-09T22:15:27Z

plugin/skills/microsoft-foundry/foundry-agent/observe/observe.md

 |----------|-------|
 | MCP server | `azure` |
-| Key MCP tools | `evaluator_catalog_get`, `evaluation_agent_batch_eval_create`, `evaluator_catalog_create`, `evaluation_comparison_create`, `prompt_optimize`, `agent_update` |
+| Key MCP tools | `evaluator_catalog_get`, `evaluation_agent_batch_eval_create`, `evaluator_catalog_create`, `evaluation_comparison_create`, `prompt_optimize`, `agent_update`, `continuous_eval_create`, `continuous_eval_get` |


continuous_eval_delete is missing from this list but is documented as a full operation in the linked continuous-eval.md reference (with its own entry point for "Delete continuous eval"). Consider adding it here for consistency.

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

plugin/skills/microsoft-foundry/foundry-agent/observe/references/cicd-monitoring.md

plugin/skills/microsoft-foundry/foundry-agent/observe/observe.md

plugin/skills/microsoft-foundry/SKILL.md

jongio

The new continuous-eval.md is solid - entry points, behavioral rules, and operations are well-structured with proper cross-references. Four additional items beyond what's already flagged:

The Disable operation likely destroys evaluator config if the tool upserts (see inline comment).
evaluation_get isn't in the Quick Reference but is used in the remediation workflow.
The observe.md DO NOT manually call guardrail doesn't cover continuous_eval_create.
Evaluator examples differ across three files with no explanation of why.

plugin/skills/microsoft-foundry/foundry-agent/observe/references/continuous-eval.md

plugin/skills/microsoft-foundry/foundry-agent/observe/observe.md

plugin/skills/microsoft-foundry/foundry-agent/observe/references/continuous-eval.md

jongio

Three additional items beyond what's already flagged:

SKILL.md description dropped the "prompt optimizer workflows" keyword that was in the previous version. The new wording has "prompt optimizer" but not "prompt optimizer workflows" - this could regress routing for that phrase. Either restore it or confirm the removal was intentional.
The scenario parameter in continuous-eval.md's optional parameters table lists standard and business values but doesn't explain what each mode does or when to choose one. An agent can't make a useful recommendation without this context.
continuous_eval_get returns a list (per the Response Format section), but the Disable and Delete workflows assume a single config. If multiple configs exist, there's no guidance on which to target - worth a note on expected cardinality or how to disambiguate.

jugonzales force-pushed the jugonzales/continuous-evals-skill branch from aa1cdb5 to be05bf7 Compare April 7, 2026 19:41

jugonzales and others added 2 commits April 9, 2026 14:17

Restructure

474a51d

jugonzales force-pushed the jugonzales/continuous-evals-skill branch from be05bf7 to 474a51d Compare April 9, 2026 21:37

jugonzales marked this pull request as ready for review April 9, 2026 21:42

jugonzales requested review from XOEEst, ankitbko and tendau as code owners April 9, 2026 21:42

Copilot AI review requested due to automatic review settings April 9, 2026 21:42

Copilot started reviewing on behalf of jugonzales April 9, 2026 21:43 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Address autogen comments

4d3c63d

jongio requested changes Apr 9, 2026

View reviewed changes

Address comments (repetitive content, missing tool name)

a62a62a

Copilot AI review requested due to automatic review settings April 9, 2026 22:23

Copilot started reviewing on behalf of jugonzales April 9, 2026 22:23 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

jugonzales changed the title ~~Add continuous-eval sub skill for Foundry agent Monitoring~~ Add continuous-eval observe ref for Foundry agent Monitoring Apr 9, 2026

jongio reviewed Apr 10, 2026

View reviewed changes

Address comments, clarify steps and tools

1d7c662

jongio reviewed Apr 11, 2026

View reviewed changes

Conversation

jugonzales commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jongio left a comment

Choose a reason for hiding this comment

Uh oh!

jongio Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

jongio Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jongio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jongio left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jugonzales commented Apr 6, 2026 •

edited

Loading