feat(vm): add OCI container container support to vm driver#889
Draft
feat(vm): add OCI container container support to vm driver#889
Conversation
Add host-side OCI pipeline to the VM compute driver so sandboxes can
boot from a user-specified `template.image` without shipping Docker
inside the guest. The driver pulls and flattens the image, injects
OpenShell compatibility files (sandbox user, /sandbox, /tmp, stub
/etc/{hosts,resolv.conf}), and builds a read-only squashfs cached per
`(manifest digest, platform)`. Sandbox-create attaches that RO base
plus a per-sandbox raw state disk; the guest init mounts both as an
overlay, bind-mounts the workspace over `/sandbox`, pivot_roots into
the merged view, then execs an unmodified `openshell-sandbox` with
the OCI argv/env/workdir.
Supervisor gains a container-mode clean-env baseline gated on
`OPENSHELL_CONTAINER_MODE=1`: the child process starts with an empty
environ, then receives only the documented allowlist (container env
from the OCI merge, provider env, proxy env, TLS env, minimal shell
defaults), so control-plane `OPENSHELL_*` vars never leak to workloads.
The gateway plumbs `--default-image` (from `sandbox_image`) and
`--mksquashfs-bin` into the VM-driver subprocess so
`GetCapabilities.default_image` stays in sync and OCI sandboxes work
without relying on env inheritance. Guest init resolves block devices
by libkrun-assigned serial under `/sys/block/vd*/serial` instead of
hardcoded `/dev/vda`/`/dev/vdb`, with the older behavior kept as a
fallback for guest kernels that don't expose serials.
Scope and limits (v1):
- Public OCI registries only, linux/amd64 or linux/arm64 matching the
host. The OCI `User` field is ignored; workloads always run as
`sandbox:sandbox`.
- The shared RO base cache is not GC'd automatically; operators manage
`<state-dir>/oci-cache/` themselves.
- The fixed guest VM rootfs stays as the control-plane image; we never
boot the user's OCI image as the guest OS.
Unit and integration tests cover: layer flattening with whiteouts,
compat injection idempotence, squashfs build + cache round-trip,
OCI-config precedence rules (Entrypoint+Cmd, workdir fallback, env
merge), driver argv wiring for `--default-image` and
`--mksquashfs-bin`, and `resolve_oci_launch` preflight error paths
(unsupported host, missing mksquashfs, no image requested).
Replace the ASCII-art overview with a mermaid flowchart that renders in GitHub's UI, and add two supporting diagrams: - Host pipeline flow: cache hit vs miss (pull \u2192 flatten \u2192 compat \u2192 squashfs \u2192 install \u2192 attach). - Guest init decision tree: probe `OPENSHELL_OCI_ARGC`, resolve disks by serial, build overlay, pivot_root, exec supervisor. - Storage layering: shared RO base, per-sandbox ext4 upper/work, and workspace bind-mount composing the sandbox runtime view. The numbered `oci_launch_supervisor` step list is retained alongside the flowchart because the precise ordering (e.g. bind-mount /proc /sys /dev before pivot_root) matters for anyone editing the init script.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add host-side OCI container execution to the VM compute driver so sandboxes can boot from a user-specified
template.imagewithout introducing a Docker runtime into the guest. The driver pulls and flattens the image into a cached read-only squashfs; the guest mounts that RO base plus a per-sandbox writable disk as an overlay,pivot_roots into the merged view, and execs an unmodifiedopenshell-sandboxwith the OCI argv/env/workdir.Related Issue
N/A — tracked via the internal architecture plan for VM-driver OCI container execution.
Changes
Host-side OCI pipeline (
crates/openshell-driver-vm/src/oci/)client.rs— anonymousoci-clientpulls pinned tolinux/amd64/linux/arm64; normalizes the image config.flatten.rs— applies OCI layer tars in order with whiteout (.wh.*,.wh..wh..opq) handling; rejects absolute and parent-traversal paths.compat.rs— injectssandbox:10001into/etc/passwdand/etc/group, ensures/sandboxand/tmp, stubs/etc/hostsand/etc/resolv.confif missing. Idempotent.fs_image.rs— shells out tomksquashfswith an explicit binary path (no$PATHreliance), zstd by default.cache.rs— content-addressed layoutblobs/ + fs/<hex>.<plat>.squashfs + meta/<hex>.<plat>.json + tmp/with atomic writes and idempotent install/lookup.metadata.rs—LaunchMetadata::buildenforces OCI precedence (argv =Entrypoint + Cmd; workdir fallback/sandbox; env merge orderOCI < template < spec).to_guest_env_vars()packs argv/env/workdir intoOPENSHELL_OCI_*for delivery via libkrunset_exec.pipeline.rs— orchestrates pull → flatten → compat → squashfs → install; short-circuits on cache hit after digest resolution.VM boot and guest init
runtime.rs/main.rs—VmLaunchConfignow supports attaching two disks (oci-baseRO +sandbox-stateRW) viakrun_add_disk3; optional import vsock is kept but unused by the overlay path.state_disk.rs— per-sandbox raw sparse state disk (16 GiB default), lifecycle-bound to the sandbox state dir.scripts/openshell-vm-sandbox-init.sh— newoci_launch_supervisorpath: resolves disks by libkrun-assigned serial via/sys/block/vd*/serial, mounts RO base + ext4 state, creates the overlay, bind-mounts the workspace over/sandbox, stages TLS CA and the supervisor binary into the upper layer, bind-mounts/proc,/sys,/dev,pivot_roots, translates OCI env →OPENSHELL_CONTAINER_ENV_<i>, setsOPENSHELL_CONTAINER_MODE=1, and execsopenshell-sandbox --workdir <wd> -- <argv>.Supervisor clean-env mode (
crates/openshell-sandbox/src/container_env.rs)OPENSHELL_CONTAINER_MODE=1. When active, the child process starts fromenv_clear()and receives only a documented allowlist (HOME/PATH/TERM defaults,OPENSHELL_CONTAINER_ENV_<i>, andOPENSHELL_SANDBOX=1applied last so images cannot override the marker). Provider/proxy/TLS env continue to layer in via the existing spawn path.Gateway wiring (
crates/openshell-server/src/compute/vm.rs,cli.rs)build_driver_argvhelper.--default-image <sandbox_image>on every VM-driver spawn soGetCapabilities.default_imagecannot silently diverge from gateway config.VmComputeConfig::mksquashfs_bin+--vm-mksquashfs-bin/OPENSHELL_VM_MKSQUASHFSflag plumbs the squashfs builder path to the driver.Driver behavior
validate_vm_sandboxrejects malformedtemplate.imagerefs and unsupported template fields.resolve_oci_launchreturnsFailedPreconditionwhen the host arch isn'tlinux/{amd64,arm64}ormksquashfs_binis unset.build_guest_environmentskips the legacyOPENSHELL_SANDBOX_COMMAND=tail -f /dev/nullfallback for OCI sandboxes so argv boundaries can't be corrupted by a fall-through code path.Docs
architecture/vm-driver.mdcovers the OCI execution model, module responsibilities, storage layout, driver configuration, and v1 scope.Testing
cargo test -p openshell-driver-vm --lib— 80/80 pass (flatten, compat, fs_image, cache, metadata, pipeline, state_disk, driver, including 3 newresolve_oci_launchtests and new OCI-mode guest env tests).cargo test -p openshell-server --lib compute::vm— 8/8 pass (6 new argv-wiring tests + 2 existing TLS tests).cargo fmt --checkclean on touched crates.cargo clippyon touched crates — no errors (pre-existing warnings in unrelatedopenshell-ocsfcrate are out of scope).mise run license:check— all 398 files have SPDX headers.bash -non the updated guest init script.oci_pipeline_integration::full_pipeline_without_network_produces_cached_image(gated onmksquashfsbeing in\$PATH, run with--ignored) verifies flatten → compat → squashfs → cache install → round-trip.E2E against a live cluster with a public image (
alpine,busybox) was not run as part of this PR; the plan's end-to-end acceptance is scheduled for a follow-up once gateway+driver integration lands on main.Checklist
architecture/vm-driver.md)