feat(prover): add OpenShell Policy Prover (OPP) by zredlined · Pull Request #703 · NVIDIA/OpenShell

zredlined · 2026-03-31T16:02:55Z

Summary

Add the OpenShell Policy Prover (OPP) — formal verification for sandbox policies using Z3 SMT solving. OPP answers two yes/no security questions about any policy:

Can data leave this sandbox? — maps exfiltration paths from readable filesystem to writable egress channels (L4 bypasses, wire protocol tunnels, L7 write channels)
Can the agent write despite read-only intent? — detects write operations that bypass read-only policy through binary capabilities, credential scopes, and transport protocol mismatches

Any finding = FAIL. No findings = PASS. Teams can acknowledge known paths via accepted-risks YAML.

Related Issue

Closes #699

Changes

Python prover (python/openshell/prover/):

2 verification queries (exfiltration + write bypass) encoded as Z3 SMT constraints
Policy parser, binary capability registry, credential loader
Terminal and compact (CI) output modes
Accepted risks filtering
~1,770 lines total

Registry data (python/openshell/prover/registry/):

17 binary capability descriptors (git, curl, python, claude, etc.)
1 API capability descriptor (GitHub)
1 credential descriptor (GitHub PAT)

Rust CLI (crates/openshell-cli/src/main.rs):

openshell policy prove subcommand — delegates to Python prover via uv run

Agent skill (.agents/skills/harden-sandbox-policy/):

Skill definition for iterative policy hardening workflow
Complements generate-sandbox-policy in the policy iteration chain

Architecture doc (architecture/policy-prover.md):

Problem statement, approach, usage, and architecture overview

SDK fix (python/openshell/__init__.py):

Graceful handling of missing proto stubs so subpackages can import independently

Testing

uv run ruff check python/openshell/prover/ — lint clean
uv run pytest python/openshell/prover/prover_test.py -v — 7/7 tests pass
cargo build -p openshell-cli — compiles
openshell policy prove --compact — detects exfil + write bypass on test fixture
License headers — all files have SPDX headers

Checklist

Follows Conventional Commits
Commits are signed off (DCO)
Architecture docs updated

Integrate Z3-based formal verification for sandbox policies as openshell.prover. Detects data exfiltration paths, write bypass attacks, L7 enforcement gaps, overpermissive methods, and privilege escalation via binary capability chains. Closes #699 Signed-off-by: Alexander Watson <zredlined@gmail.com>

IntoDiagnostic trait is not imported in main.rs scope. Signed-off-by: Alexander Watson <zredlined@gmail.com>

Wrap the sandbox import in try/except so subpackages like openshell.prover can be imported without a full Rust build. Removes the conftest.py namespace hack that was working around this. Signed-off-by: Alexander Watson <zredlined@gmail.com>

Bare python3 doesn't have z3-solver/rich/pyyaml. Using uv run ensures the prover runs in the project venv with all dependencies. Signed-off-by: Alexander Watson <zredlined@gmail.com>

… files Signed-off-by: Alexander Watson <zredlined@gmail.com>

Signed-off-by: Alexander Watson <zredlined@gmail.com>

Narrow OPP to two yes/no security questions: 1. Can data leave this sandbox? (exfiltration) 2. Can the agent write despite read-only intent? (write bypass) Removes 4 advisory/lint queries (overpermissive methods, L4 gaps, binary inheritance, inference relay), 3 output modes (Mermaid, topology, HTML), and fixes 3 bugs (include_workdir not used in exfil analysis, __all__ broken when protos missing, CLI PYTHONPATH). ~40% line reduction (2900 -> 1769 lines). Signed-off-by: Alexander Watson <zredlined@gmail.com>

Only report exfiltration paths where enforcement diverges from intent: - L4-only endpoints (no HTTP inspection) - Wire protocol bypasses (L7 exists but binary circumvents it) L7-enforced endpoints where POST is explicitly allowed are the policy working as designed, not findings. Remove binary inheritance from the Z3 model — transitive spawn chain access produced noise (git inheriting access to api.telegram.org via node). Direct policy grants only. Inheritance can return later as an opt-in extended analysis mode. NemoClaw default policy: 38 findings -> 9 actionable gaps. Signed-off-by: Alexander Watson <zredlined@gmail.com>

L4-only endpoints are not implicitly read-only. A policy with access: full and no protocol is an explicit grant of full access, not a violated intent. Write bypass now only fires when the policy expresses read-only intent via access: read-only or GET-only L7 rules. Signed-off-by: Alexander Watson <zredlined@gmail.com>

Replace broad contextlib.suppress with explicit error inspection. Proto/gRPC import failures are silently logged. Other ImportErrors surface as warnings so real issues aren't swallowed. Signed-off-by: Alexander Watson <zredlined@gmail.com>

Add openshell-prover crate implementing formal policy verification using Z3 SMT solving. Answers two questions about any sandbox policy: "Can data leave?" and "Can the agent write despite read-only intent?" Native Rust — no Python subprocess, no PYTHONPATH, no uv dependency. Z3 bundled via z3-sys for self-contained builds. Replaces the Python prototype from #703. Closes #699 Signed-off-by: Alexander Watson <zredlined@gmail.com>

zredlined · 2026-04-02T20:31:32Z

Superseded by native Rust implementation — see new PR.

Add openshell-prover crate implementing formal policy verification using Z3 SMT solving. Answers two questions about any sandbox policy: "Can data leave?" and "Can the agent write despite read-only intent?" Native Rust — no Python subprocess, no PYTHONPATH, no uv dependency. Z3 bundled via z3-sys for self-contained builds. Replaces the Python prototype from #703. Closes #699 Signed-off-by: Alexander Watson <zredlined@gmail.com>

* feat(prover): add native Rust policy prover with Z3 solver Add openshell-prover crate implementing formal policy verification using Z3 SMT solving. Answers two questions about any sandbox policy: "Can data leave?" and "Can the agent write despite read-only intent?" Native Rust — no Python subprocess, no PYTHONPATH, no uv dependency. Z3 bundled via z3-sys for self-contained builds. Replaces the Python prototype from #703. Closes #699 Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(prover): skip L7-write in exfil, write bypass only on read-only intent Port two fixes from the Python branch: - Exfil query skips endpoints where L7 is enforced and working - Write bypass only fires on explicit read-only intent, not L4-only Signed-off-by: Alexander Watson <zredlined@gmail.com> * chore(prover): add missing SPDX license headers to registry and testdata YAML files * fix(prover): revert serde_yaml to serde_yml to match workspace dependency on main * fix(prover): apply cargo fmt formatting to prover and cli source files * ci: add clang and libclang-dev to CI image for z3-sys bindgen z3-sys requires libclang at build time for bindgen to generate FFI bindings. Without it, the Rust CI jobs fail on the prover crate. * fix(prover): use system libz3 instead of compiling from source Add libz3-dev to the CI image and drop the z3 `bundled` feature from the workspace dependency. This eliminates the ~30 min z3 C++ build that ran on every CI cache miss. sccache only wraps rustc — it does not intercept the cmake C++ build that z3-sys runs when `bundled` is enabled. The cargo target cache helped on warm runs but evicts on toolchain bumps and fork PRs. With the system library pre-installed in the CI image, z3 link time is always zero. A `bundled-z3` opt-in feature is added to the prover crate for local development without system z3: cargo build -p openshell-prover --features bundled-z3 Regular local dev: brew install z3 (macOS) or apt install libz3-dev (Linux), then cargo build just works. Signed-off-by: Alexander Watson <zredlined@gmail.com> Made-with: Cursor * fix(prover): use u16 for ports, align include_workdir default with runtime Port fields changed from u32 to u16 across all prover types (policy, model, finding, queries). Prevents the prover from silently accepting port values >65535 that the runtime rejects, which would produce misleading PASS results on invalid policies. Change include_workdir serde default from true to false to match the runtime (openshell-policy uses #[serde(default)] which gives false). The previous mismatch caused the prover to model a /sandbox path that does not exist at runtime, producing false analysis. Signed-off-by: Alexander Watson <zredlined@gmail.com> Made-with: Cursor * refactor(prover): embed registry at compile time, replace hand-rolled glob Embed binary and API capability registry YAML files into the binary at compile time using include_dir!. The previous approach used env!("CARGO_MANIFEST_DIR") which bakes in the build machine's source path — works in tests but breaks for installed binaries. The CWD fallback was equally fragile. The --registry CLI flag still works as a filesystem override for custom registries. Credentials remain filesystem-loaded (user-supplied data). Replace the hand-rolled glob matching (~60 lines) with the glob crate's Pattern::matches(), which is already a transitive dependency. Remove unused dependencies: openshell-policy, openshell-core, thiserror, tracing. The prover parses policy YAML directly into its own types and does not use the policy or core crate APIs. Signed-off-by: Alexander Watson <zredlined@gmail.com> Made-with: Cursor * docs: add Z3 system library to prerequisites The prover crate now links against system libz3 instead of compiling from source. Document the install steps for macOS, Ubuntu, and Fedora, and note the bundled-z3 feature flag as a fallback. Signed-off-by: Alexander Watson <zredlined@gmail.com> Made-with: Cursor * fix(prover): apply cargo fmt formatting Made-with: Cursor --------- Signed-off-by: Alexander Watson <zredlined@gmail.com>

zredlined added 4 commits March 31, 2026 08:31

fix(prover): use map_err instead of into_diagnostic in CLI handler

68715f9

IntoDiagnostic trait is not imported in main.rs scope. Signed-off-by: Alexander Watson <zredlined@gmail.com>

fix(prover): use uv run for CLI subprocess to pick up venv deps

4653da2

Bare python3 doesn't have z3-solver/rich/pyyaml. Using uv run ensures the prover runs in the project venv with all dependencies. Signed-off-by: Alexander Watson <zredlined@gmail.com>

zredlined self-assigned this Mar 31, 2026

zredlined changed the title ~~feat(prover): add OpenShell Policy Prover (OPP) to Python SDK~~ feat(prover): add OpenShell Policy Prover (OPP) Mar 31, 2026

zredlined added 4 commits March 31, 2026 09:08

chore(prover): add SPDX license headers to registry and testdata YAML…

05add24

… files Signed-off-by: Alexander Watson <zredlined@gmail.com>

docs(prover): add architecture doc for OpenShell Policy Prover

1656320

Signed-off-by: Alexander Watson <zredlined@gmail.com>

fix(sdk): use contextlib.suppress per SIM105 lint rule

0aaba52

Signed-off-by: Alexander Watson <zredlined@gmail.com>

zredlined marked this pull request as ready for review March 31, 2026 18:40

zredlined requested a review from a team as a code owner March 31, 2026 18:40

zredlined added 2 commits March 31, 2026 13:40

johntmyers reviewed Apr 2, 2026

View reviewed changes

Comment thread python/openshell/__init__.py Outdated

zredlined mentioned this pull request Apr 2, 2026

feat(providers): provider-attached endpoints with verified defaults #734

Closed

zredlined closed this Apr 2, 2026

zredlined mentioned this pull request Apr 2, 2026

feat(prover): add native Rust policy prover with Z3 solver #741

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(prover): add OpenShell Policy Prover (OPP)#703

feat(prover): add OpenShell Policy Prover (OPP)#703
zredlined wants to merge 11 commits intomainfrom
feat/699-policy-prover/zredlined

zredlined commented Mar 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

zredlined commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zredlined commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Testing

Checklist

Uh oh!

Uh oh!

zredlined commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zredlined commented Mar 31, 2026 •

edited

Loading