Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 19 additions & 2 deletions architecture/gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ graph TD
SUP_REG["SupervisorSessionRegistry"]
STORE["Store<br/>(SQLite / Postgres)"]
COMPUTE["ComputeRuntime"]
DRIVER["ComputeDriver<br/>(kubernetes / vm)"]
DRIVER["ComputeDriver<br/>(kubernetes / docker / vm)"]
WATCH_BUS["SandboxWatchBus"]
LOG_BUS["TracingLogBus"]
PLAT_BUS["PlatformEventBus"]
Expand Down Expand Up @@ -75,6 +75,7 @@ graph TD
| TLS | `crates/openshell-server/src/tls.rs` | `TlsAcceptor` wrapping rustls with ALPN |
| Persistence | `crates/openshell-server/src/persistence/mod.rs` | `Store` enum (SQLite/Postgres), generic object CRUD, protobuf codec |
| Compute runtime | `crates/openshell-server/src/compute/mod.rs` | `ComputeRuntime`, gateway-owned sandbox lifecycle orchestration over a compute backend |
| Compute driver: Docker | `crates/openshell-server/src/compute/docker.rs` | In-process Docker create/delete/watch, supervisor side-load, local daemon integration |
| Compute driver: Kubernetes | `crates/openshell-driver-kubernetes/src/driver.rs` | Kubernetes CRD create/delete/watch, pod template translation |
| Compute driver: VM | `crates/openshell-driver-vm/src/driver.rs` | Per-sandbox microVM create/delete/watch, supervisor-only guest boot |
| Sandbox index | `crates/openshell-server/src/sandbox_index.rs` | `SandboxIndex` -- in-memory name/pod-to-id correlation |
Expand Down Expand Up @@ -103,6 +104,7 @@ The gateway boots in `cli::run_cli` (`crates/openshell-server/src/cli.rs`) and p
1. Connect to the persistence store (`Store::connect`), which auto-detects SQLite vs Postgres from the URL prefix and runs migrations.
2. Create `ComputeRuntime` with a `ComputeDriver` implementation selected by `OPENSHELL_DRIVERS`:
- `kubernetes` wraps `KubernetesComputeDriver` in `ComputeDriverService`, so the gateway uses the `openshell.compute.v1.ComputeDriver` RPC surface even without transport.
- `docker` constructs `DockerComputeDriver` in-process, talks directly to the local Docker daemon through Bollard, and keeps Docker-only configuration (supervisor/TLS bind mounts) local to `openshell-server`.
- `vm` spawns the sibling `openshell-driver-vm` binary as a local compute-driver process, connects to it over a Unix domain socket, and keeps the libkrun/rootfs runtime out of the gateway binary.
3. Build `ServerState` (shared via `Arc<ServerState>` across all handlers), including a fresh `SupervisorSessionRegistry`.
4. **Spawn background tasks**:
Expand Down Expand Up @@ -132,7 +134,11 @@ All configuration is via CLI flags with environment variable fallbacks. The `--d
| `--sandbox-namespace` | `OPENSHELL_SANDBOX_NAMESPACE` | `default` | Kubernetes namespace for sandbox CRDs |
| `--sandbox-image` | `OPENSHELL_SANDBOX_IMAGE` | None | Default container image for sandbox pods |
| `--grpc-endpoint` | `OPENSHELL_GRPC_ENDPOINT` | None | gRPC endpoint reachable from within the cluster (for supervisor callbacks) |
| `--drivers` | `OPENSHELL_DRIVERS` | `kubernetes` | Compute backend to use. Current options are `kubernetes` and `vm`. |
| `--drivers` | `OPENSHELL_DRIVERS` | `kubernetes` | Compute backend to use. Current options are `kubernetes`, `docker`, and `vm`. |
| `--docker-supervisor-bin` | `OPENSHELL_DOCKER_SUPERVISOR_BIN` | Linux: sibling `openshell-sandbox`; macOS: auto-discovered local Linux build | Linux `openshell-sandbox` binary bind-mounted into Docker sandboxes as PID 1 |
| `--docker-tls-ca` | `OPENSHELL_DOCKER_TLS_CA` | None | CA cert bind-mounted into Docker sandboxes at `/etc/openshell/tls/client/ca.crt` for gateway mTLS |
| `--docker-tls-cert` | `OPENSHELL_DOCKER_TLS_CERT` | None | Client cert bind-mounted into Docker sandboxes at `/etc/openshell/tls/client/tls.crt` for gateway mTLS |
| `--docker-tls-key` | `OPENSHELL_DOCKER_TLS_KEY` | None | Client private key bind-mounted into Docker sandboxes at `/etc/openshell/tls/client/tls.key` for gateway mTLS |
| `--vm-driver-state-dir` | `OPENSHELL_VM_DRIVER_STATE_DIR` | `target/openshell-vm-driver` | Host directory for VM sandbox rootfs, console logs, and runtime state |
| `--vm-compute-driver-bin` | `OPENSHELL_VM_COMPUTE_DRIVER_BIN` | sibling `openshell-driver-vm` binary | Local VM compute-driver process spawned by the gateway |
| `--vm-krun-log-level` | `OPENSHELL_VM_KRUN_LOG_LEVEL` | `1` | libkrun log level for VM helper processes |
Expand Down Expand Up @@ -599,6 +605,17 @@ The Helm chart template is at `deploy/helm/openshell/templates/statefulset.yaml`

The gateway reaches the sandbox exclusively through the supervisor-initiated `ConnectSupervisor` session, so the driver never returns sandbox network endpoints.

### Docker Driver

`DockerComputeDriver` (`crates/openshell-server/src/compute/docker.rs`) is built directly into the gateway. It connects to the local Docker daemon with Bollard and provisions one long-lived container per sandbox.

- **Create**: Pulls the requested image according to `sandbox_image_pull_policy`, creates a labeled container, bind-mounts a Linux `openshell-sandbox` binary read-only at `/opt/openshell/bin/openshell-sandbox`, and starts that supervisor as PID 1. No sandbox ports are published.
- **Persistence**: The Docker driver does not create a separate workspace volume. `/sandbox` lives on the container writable layer, so data persists across gateway restarts as long as the same container remains.
- **Gateway callback**: When `OPENSHELL_GRPC_ENDPOINT` points at `localhost` or another loopback address, the driver rewrites it to `host.openshell.internal` inside the container and injects `host-gateway` aliases so the supervisor can still open its outbound `ConnectSupervisor` stream.
- **TLS**: For `https://` gateway endpoints, the driver requires `--docker-tls-ca`, `--docker-tls-cert`, and `--docker-tls-key`. These files are bind-mounted read-only into `/etc/openshell/tls/client`, and the driver sets `OPENSHELL_TLS_CA`, `OPENSHELL_TLS_CERT`, and `OPENSHELL_TLS_KEY` to those paths.
- **Limits**: V1 supports only `cpu_limit` and `memory_limit`, mapped to Docker `NanoCpus` and `Memory`. GPU requests, resource requests, `agent_socket_path`, and non-empty `platform_config` are rejected as failed preconditions.
- **Watch stream**: The driver polls Docker for OpenShell-managed containers, emits snapshot diffs and deletions, and rebuilds its state from labels after gateway restart. Containers running under Docker restart policy `unless-stopped` come back after daemon restart without any inbound port setup.

### VM Driver

`VmDriver` (`crates/openshell-driver-vm/src/driver.rs`) is served by the standalone `openshell-driver-vm` process. The gateway spawns that binary on demand and talks to it over the internal `openshell.compute.v1.ComputeDriver` gRPC contract via a Unix domain socket.
Expand Down
13 changes: 10 additions & 3 deletions crates/openshell-core/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ use std::str::FromStr;
pub enum ComputeDriverKind {
Kubernetes,
Vm,
Docker,
Podman,
}

Expand All @@ -24,6 +25,7 @@ impl ComputeDriverKind {
match self {
Self::Kubernetes => "kubernetes",
Self::Vm => "vm",
Self::Docker => "docker",
Self::Podman => "podman",
}
}
Expand All @@ -42,9 +44,10 @@ impl FromStr for ComputeDriverKind {
match value.trim().to_ascii_lowercase().as_str() {
"kubernetes" => Ok(Self::Kubernetes),
"vm" => Ok(Self::Vm),
"docker" => Ok(Self::Docker),
"podman" => Ok(Self::Podman),
other => Err(format!(
"unsupported compute driver '{other}'. expected one of: kubernetes, vm, podman"
"unsupported compute driver '{other}'. expected one of: kubernetes, vm, docker, podman"
)),
}
}
Expand Down Expand Up @@ -370,12 +373,16 @@ mod tests {
"podman".parse::<ComputeDriverKind>().unwrap(),
ComputeDriverKind::Podman
);
assert_eq!(
"docker".parse::<ComputeDriverKind>().unwrap(),
ComputeDriverKind::Docker
);
}

#[test]
fn compute_driver_kind_rejects_unknown_values() {
let err = "docker".parse::<ComputeDriverKind>().unwrap_err();
assert!(err.contains("unsupported compute driver 'docker'"));
let err = "firecracker".parse::<ComputeDriverKind>().unwrap_err();
assert!(err.contains("unsupported compute driver 'firecracker'"));
}

#[test]
Expand Down
33 changes: 20 additions & 13 deletions crates/openshell-sandbox/src/procfs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -530,9 +530,12 @@ mod tests {
}

/// An unlinked executable whose filename contains non-UTF-8 bytes must
/// still strip exactly one kernel-added `" (deleted)"` suffix. We operate
/// on raw bytes via `OsStrExt`, so invalid UTF-8 is not a reason to skip
/// the strip and return a path that downstream `stat()` calls will reject.
/// still resolve to its original path. Some kernels append a literal
/// `" (deleted)"` suffix to `/proc/<pid>/exe` after unlink while others
/// do not for this edge case, so the assertion has to tolerate both.
///
/// When the suffix is present, we still need to strip exactly one copy
/// while operating on raw bytes via `OsStrExt`.
#[cfg(target_os = "linux")]
#[test]
fn binary_path_strips_suffix_for_non_utf8_filename() {
Expand Down Expand Up @@ -571,13 +574,10 @@ mod tests {
let pid: i32 = child.id().cast_signed();
std::fs::remove_file(&exe_path).unwrap();

// Sanity: raw readlink ends with " (deleted)" and is not valid UTF-8.
// Sanity: the raw readlink remains non-UTF-8 after unlink.
let raw = std::fs::read_link(format!("/proc/{pid}/exe")).unwrap();
let raw_bytes = raw.as_os_str().as_bytes();
assert!(
raw_bytes.ends_with(b" (deleted)"),
"kernel should append ' (deleted)' to unlinked exe readlink"
);
let kernel_appended_deleted_suffix = raw_bytes.ends_with(b" (deleted)");
assert!(
std::str::from_utf8(raw_bytes).is_err(),
"test precondition: raw readlink must contain non-UTF-8 bytes"
Expand All @@ -587,12 +587,19 @@ mod tests {
binary_path(pid).expect("binary_path should succeed for non-UTF-8 unlinked path");
assert_eq!(
resolved, exe_path,
"binary_path must strip exactly one ' (deleted)' suffix for non-UTF-8 paths"
);
assert!(
!resolved.as_os_str().as_bytes().ends_with(b" (deleted)"),
"stripped path must not end with ' (deleted)'"
"binary_path must resolve non-UTF-8 unlinked paths back to the original filename"
);
if kernel_appended_deleted_suffix {
assert!(
!resolved.as_os_str().as_bytes().ends_with(b" (deleted)"),
"stripped path must not end with ' (deleted)'"
);
} else {
assert_eq!(
raw, exe_path,
"kernels that omit the deleted suffix should report the original unlinked path"
);
}

let _ = child.kill();
let _ = child.wait();
Expand Down
1 change: 1 addition & 0 deletions crates/openshell-server/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ sqlx = { workspace = true }
reqwest = { workspace = true }
uuid = { workspace = true }
url = { workspace = true }
bollard = { version = "0.20" }
hmac = "0.12"
sha2 = "0.10"
hex = "0.4"
Expand Down
27 changes: 25 additions & 2 deletions crates/openshell-server/src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ use std::path::PathBuf;
use tracing::info;
use tracing_subscriber::EnvFilter;

use crate::compute::VmComputeConfig;
use crate::compute::{DockerComputeConfig, VmComputeConfig};
use crate::{run_server, tracing_bus::TracingLogBus};

/// `OpenShell` gateway process - gRPC and HTTP server with protocol multiplexing.
Expand Down Expand Up @@ -157,6 +157,22 @@ struct Args {
#[arg(long, env = "OPENSHELL_VM_TLS_KEY")]
vm_tls_key: Option<PathBuf>,

/// Linux `openshell-sandbox` binary bind-mounted into Docker sandboxes.
#[arg(long, env = "OPENSHELL_DOCKER_SUPERVISOR_BIN")]
docker_supervisor_bin: Option<PathBuf>,

/// CA certificate bind-mounted into Docker sandboxes for gateway mTLS.
#[arg(long, env = "OPENSHELL_DOCKER_TLS_CA")]
docker_tls_ca: Option<PathBuf>,

/// Client certificate bind-mounted into Docker sandboxes for gateway mTLS.
#[arg(long, env = "OPENSHELL_DOCKER_TLS_CERT")]
docker_tls_cert: Option<PathBuf>,

/// Client private key bind-mounted into Docker sandboxes for gateway mTLS.
#[arg(long, env = "OPENSHELL_DOCKER_TLS_KEY")]
docker_tls_key: Option<PathBuf>,

/// Disable TLS entirely — listen on plaintext HTTP.
/// Use this when the gateway sits behind a reverse proxy or tunnel
/// (e.g. Cloudflare Tunnel) that terminates TLS at the edge.
Expand Down Expand Up @@ -266,6 +282,13 @@ async fn run_from_args(args: Args) -> Result<()> {
guest_tls_key: args.vm_tls_key,
};

let docker_config = DockerComputeConfig {
supervisor_bin: args.docker_supervisor_bin,
guest_tls_ca: args.docker_tls_ca,
guest_tls_cert: args.docker_tls_cert,
guest_tls_key: args.docker_tls_key,
};

if args.disable_tls {
info!("TLS disabled — listening on plaintext HTTP");
} else if args.disable_gateway_auth {
Expand All @@ -274,7 +297,7 @@ async fn run_from_args(args: Args) -> Result<()> {

info!(bind = %config.bind_address, "Starting OpenShell server");

run_server(config, vm_config, tracing_log_bus)
run_server(config, vm_config, docker_config, tracing_log_bus)
.await
.into_diagnostic()
}
Expand Down
Loading
Loading