Skip to content

feat: add runtime command boundary#412

Merged
thymikee merged 12 commits intomainfrom
feat/runtime-command-boundary
Apr 16, 2026
Merged

feat: add runtime command boundary#412
thymikee merged 12 commits intomainfrom
feat/runtime-command-boundary

Conversation

@thymikee
Copy link
Copy Markdown
Contributor

@thymikee thymikee commented Apr 15, 2026

Summary

Introduce agent-device as a reusable command runtime instead of a collection of helper exports. The CLI and daemon keep their existing behavior, while command semantics now have a public JS boundary that can run against local or hosted backend adapters.

This PR adds the runtime/backend/io split, migrates the first command families into it, and keeps compatibility subpaths available while the remaining command inventory is migrated.

Scope: 69 files touched across the runtime public boundary, migrated capture/selector/interaction slices, tests/docs, and one Android replay hardening change.

New JS API

Root runtime entrypoint:

import {
  createAgentDevice,
  createLocalArtifactAdapter,
  createMemorySessionStore,
  localCommandPolicy,
  ref,
  selector,
  type AgentDeviceBackend,
} from 'agent-device';

const backend: AgentDeviceBackend = /* local or hosted adapter */;

const device = createAgentDevice({
  backend,
  artifacts: createLocalArtifactAdapter(),
  sessions: createMemorySessionStore([{ name: 'default' }]),
  policy: localCommandPolicy(),
});

await device.capture.screenshot({
  session: 'default',
  out: { kind: 'path', path: './screen.png' },
});

await device.capture.diffScreenshot({
  session: 'default',
  baseline: { kind: 'path', path: './baseline.png' },
  out: { kind: 'path', path: './diff.png' },
  threshold: 0.1,
});

await device.selectors.getText(selector('label=Continue'), {
  session: 'default',
  depth: 2,
  raw: true,
});

await device.interactions.click(ref('@e1'), {
  session: 'default',
});

Router entrypoint for transport adapters:

import { createAgentDevice } from 'agent-device';
import { createCommandRouter } from 'agent-device/commands';

const router = createCommandRouter({
  createRuntime(request) {
    return createAgentDevice({
      backend: createBackendForRequest(request),
      artifacts: createArtifactAdapterForRequest(request),
      sessions: createSessionStoreForRequest(request),
      policy: createPolicyForRequest(request),
    });
  },
});

const result = await router.dispatch({
  command: 'capture.screenshot',
  options: {
    session: 'default',
    out: { kind: 'downloadableArtifact', fileName: 'screen.png' },
  },
});

New public subpaths:

  • agent-device/commands: implemented command APIs, selector helpers, command router, command catalog
  • agent-device/backend: backend primitive and capability interfaces
  • agent-device/io: file refs, artifact descriptors, output reservation, temp file contracts
  • agent-device/testing/conformance: backend conformance suites for migrated command families

createAgentDeviceClient() remains the daemon RPC client. createAgentDevice() is the in-process runtime.

Key Changes

  • Add createAgentDevice() with explicit backend, artifacts, sessions, policy, diagnostics, clock, and signal runtime dependencies.
  • Add first-class IO refs:
    • inputs: { kind: 'path' } and { kind: 'uploadedArtifact' }
    • outputs: { kind: 'path' } and { kind: 'downloadableArtifact' }
    • artifacts: discriminated localPath / artifact descriptors
  • Add localCommandPolicy() and restrictedCommandPolicy() so local filesystem behavior is explicit and service-safe defaults are available.
  • Add createMemorySessionStore() with defensive cloning for snapshots and metadata.
  • Add optional rootDir containment for createLocalArtifactAdapter() explicit local paths.
  • Document that shared CommandSessionStore implementations must serialize per-session updates when used concurrently.
  • Add named backend capability gates for escape hatches instead of a freeform backend bag.
  • Expose only implemented bound commands from createAgentDevice(); planned commands live in commandCatalog rather than autocomplete-only throwing methods.
  • Keep macOS interaction policy tied to trusted session state; caller metadata cannot override desktop/menubar guards.
  • Preserve selector snapshot options (depth, scope, raw) through the public runtime API and daemon bridge.
  • Restore rapid selector snapshot reuse for read-only find loops while preserving Android freshness bypasses.
  • Preserve recording metadata from runtime-routed touch interactions through daemon responses and recorded action results.
  • Ensure Android escape detection fails before action recording/finalization.
  • Use snapshot timestamps consistently for stale-drop warnings, including backend-supplied snapshot objects, injected runtime clocks, and clock-skew fallback.
  • Preserve screenshot surface options through live diff screenshot capture, CLI diff screenshot, and the CLI client bridge.
  • Harden Android e2e replay navigation by replacing a brittle Settings snapshot ref with a semantic label=Notifications selector.

Migrated Runtime Commands

Capture:

  • capture.screenshot
  • capture.diffScreenshot
  • capture.snapshot
  • capture.diffSnapshot

Selectors:

  • selectors.find
  • selectors.get
  • selectors.getText
  • selectors.getAttrs
  • selectors.is
  • selectors.isVisible
  • selectors.isHidden
  • selectors.wait
  • selectors.waitForText

Interactions:

  • interactions.click
  • interactions.press
  • interactions.fill
  • interactions.typeText

The daemon and CLI paths now route these slices through the runtime while preserving existing daemon-shaped responses, selector recording, snapshot flag behavior, macOS surface guards, stale-ref refresh, Android escape handling, recording metadata, and screenshot cleanup semantics.

Follow-ups

Remaining command families are tracked in COMMAND_OWNERSHIP.md. Compatibility helper subpaths stay available for now and are documented as migration-era APIs rather than the preferred public boundary.

Validation

  • pnpm format
  • pnpm check
  • pnpm test:integration
  • node --test test/integration/installed-package-metro.test.ts
  • pnpm test:unit src/__tests__/cli-client-commands.test.ts src/__tests__/runtime-snapshot.test.ts
  • pnpm test:unit src/__tests__/runtime-selector-read.test.ts src/__tests__/runtime-interactions.test.ts src/__tests__/runtime-diff-screenshot.test.ts src/__tests__/runtime-snapshot.test.ts src/__tests__/runtime-public.test.ts src/daemon/handlers/__tests__/interaction.test.ts
  • pnpm test:replay:ios (6 passed, 0 failed)
  • pnpm test:replay:android (6 passed, 0 failed after hardening the stale Android Settings replay ref)
  • pnpm check:quick
  • pnpm check:unit
  • pnpm check:tooling
  • pnpm test:unit src/daemon/handlers/__tests__/find.test.ts src/__tests__/runtime-snapshot.test.ts src/__tests__/runtime-public.test.ts
  • pnpm test:unit src/__tests__/cli-batch.test.ts src/__tests__/cli-diagnostics.test.ts
  • git diff --check

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 15, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://callstackincubator.github.io/agent-device/pr-preview/pr-412/

Built to branch gh-pages at 2026-04-16 10:28 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80dd64f5c9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/daemon/selector-runtime.ts Outdated
Comment on lines +294 to +296
snapshotScope: options?.scope,
snapshotDepth: options?.depth,
snapshotRaw: options?.raw,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve selector snapshot flags in runtime capture

The selector runtime now overwrites snapshotScope, snapshotDepth, and snapshotRaw with values from options, but for get/is/wait (and read-only find in most paths) those fields are usually undefined. Because this assignment happens after spreading req.flags, user-provided CLI flags like --snapshot-depth and --snapshot-raw are silently dropped, changing selector resolution behavior versus the previous handler path. Only override these fields when an explicit runtime option is provided.

Useful? React with 👍 / 👎.

Comment thread src/runtime.ts Outdated
function cloneSessionRecord(
record: CommandSessionRecord | undefined,
): CommandSessionRecord | undefined {
return record ? { ...record } : undefined;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Deep-clone session records returned by memory store

The in-memory session store returns a shallow copy of a session record, so nested objects (notably snapshot) are still shared with internal state. If a caller mutates record.snapshot.nodes after get, it mutates the store’s baseline in place, which can corrupt later diff/get/is behavior and defeats the “no mutable references” contract implied by this helper. Clone nested session fields (at least snapshot/metadata) before returning.

Useful? React with 👍 / 👎.

@thymikee thymikee force-pushed the feat/runtime-command-boundary branch from 278e890 to 5c88f98 Compare April 16, 2026 10:28
@thymikee thymikee merged commit 8bcefb7 into main Apr 16, 2026
15 checks passed
@thymikee thymikee deleted the feat/runtime-command-boundary branch April 16, 2026 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant