Skip to content

feat: FIPS 140-3 compliance path for TLS and SSH crypto stack #900

@rdwj

Description

@rdwj

Problem Statement

OpenShell's TLS and SSH subsystems use non-FIPS-validated cryptographic libraries and default to non-FIPS-approved algorithms. On FIPS-enabled clusters (common in government, defense, and regulated-industry Kubernetes deployments), this makes OpenShell non-compliant from an audit perspective -- even though the kernel does not block the operations at runtime.

Specifically:

  • TLS (all connections): Uses ring 0.17 via rustls. ring has no FIPS validation path. Default cipher suites include ChaCha20-Poly1305 and X25519 key exchange, neither of which are FIPS-approved.
  • SSH (sandbox transport): Uses russh 0.57 with a mix of aws-lc-rs, ed25519-dalek, and curve25519-dalek. The sandbox SSH server hardcodes Ed25519 host keys (ssh.rs:56). Default negotiation prefers ChaCha20-Poly1305 and Curve25519 key exchange.
  • PKI (certificate generation): Uses rcgen 0.13 backed by ring. The default algorithm (ECDSA P-256) is FIPS-approved, but the implementation module is not validated.

FIPS-enabled RHEL 9 / OpenShift 4.x clusters enforce FIPS 140-3 via system-wide crypto policies. Processes using non-validated crypto modules fail compliance audits regardless of the algorithms selected. There are no existing FIPS-related issues in the tracker.

This is complementary to #899 (Platform mode / restricted SCC support) -- FIPS clusters are a subset of the managed Kubernetes deployments that issue addresses.

Proposed Design

Add a workspace-level fips Cargo feature flag that switches the crypto backend from ring to aws-lc-rs in FIPS mode (CMVP certificate #4631), restricts algorithm negotiation to FIPS-approved algorithms only, and documents the SSH layer's validation gap.

Phase 1: Feature-flagged FIPS for TLS + PKI

Crypto provider switch -- The three binary entry points that install the rustls CryptoProvider would switch based on the feature flag:

// Current (all three binaries):
rustls::crypto::ring::default_provider().install_default()

// With --features fips:
rustls::crypto::aws_lc_rs::default_provider().install_default()

Workspace dependency changes:

# Current:
rustls = { version = "0.23", default-features = false, features = ["std", "logging", "tls12", "ring"] }
tokio-rustls = { version = "0.26", default-features = false, features = ["logging", "tls12", "ring"] }
rcgen = { version = "0.13", features = ["crypto"] }

# With fips feature:
rustls = { version = "0.23", default-features = false, features = ["std", "logging", "tls12", "aws_lc_rs", "fips"] }
tokio-rustls = { version = "0.26", default-features = false, features = ["logging", "tls12", "aws-lc-rs"] }
rcgen = { version = "0.13", default-features = false, features = ["aws_lc_rs", "pem"] }

TLS cipher suite restriction -- In FIPS mode, configure the provider to exclude ChaCha20-Poly1305 cipher suites and X25519 key exchange, allowing only:

  • TLS 1.3: TLS13_AES_256_GCM_SHA384, TLS13_AES_128_GCM_SHA256
  • TLS 1.2: TLS_ECDHE_ECDSA_WITH_AES_*_GCM_SHA*, TLS_ECDHE_RSA_WITH_AES_*_GCM_SHA*
  • Key exchange: ECDH-P256, ECDH-P384 (no X25519)

SSH algorithm restriction -- Change the sandbox SSH server's host key from Ed25519 to ECDSA-P256 and configure russh::server::Config::preferred / russh::client::Config::preferred to exclude non-FIPS algorithms:

  • Host keys: ecdsa-sha2-nistp256, ecdsa-sha2-nistp384, rsa-sha2-256, rsa-sha2-512 (no ssh-ed25519)
  • Key exchange: ecdh-sha2-nistp256, ecdh-sha2-nistp384, diffie-hellman-group14-sha256, diffie-hellman-group16-sha512 (no Curve25519, no post-quantum mlkem)
  • Ciphers: aes256-gcm@openssh.com, aes128-gcm@openssh.com, aes256-ctr, aes128-ctr (no ChaCha20-Poly1305)

HMAC switch -- The NSSH1 handshake (ssh.rs:322, ssh_tunnel.rs:320) uses RustCrypto hmac + sha2. In FIPS mode, replace with aws-lc-rs HMAC-SHA256.

Transitive dependency updates -- reqwest, sqlx, tokio-tungstenite, and hyper-rustls all pull in rustls and/or ring. Each needs feature flags updated for the FIPS build:

  • reqwest: switch from rustls-tls to using the globally-installed CryptoProvider
  • sqlx runtime-tokio-rustls: should respect the global provider
  • hyper-rustls: switch from ring to aws-lc-rs feature
  • tokio-tungstenite: switch from rustls-tls-native-roots to aws-lc-rs backend

Phase 2 (deferred): SSH transport FIPS validation

Phase 1 restricts SSH to FIPS-approved algorithms but the underlying implementations (ed25519-dalek, p256, aes from RustCrypto) remain non-validated modules. This is a known gap. The SSH transport only operates within the cluster's mTLS boundary (gateway-to-sandbox), providing defense-in-depth rather than being the primary trust boundary.

If strict auditors require validated modules for the SSH layer, Phase 2 options include:

  • Upstream russh support for aws-lc-rs as its crypto backend
  • Replacing the embedded russh server with an OpenSSH subprocess (significant architecture change given the deep integration at ssh.rs -- 1700+ lines of process spawning, PTY management, channel handling, SFTP subsystem)

Scope boundaries:

  • The fips feature is off by default -- current behavior is preserved
  • Phase 1 achieves FIPS-validated crypto for all TLS operations (the external-facing attack surface) and FIPS-approved algorithms for SSH
  • Phase 1 explicitly documents the SSH validation gap
  • Phase 2 is deferred to actual audit requirements

Alternatives Considered

  1. System OpenSSL for everything -- Replace rustls with the openssl crate and russh with libssh2 or OpenSSH subprocess. True FIPS validation for all operations via RHEL 9's OpenSSL 3.x (CMVP #4282). Rejected for Phase 1: massive rewrite, loses rustls memory safety guarantees, adds system library dependency, and significantly complicates cross-platform builds.

  2. Partial compliance with documented exceptions -- FIPS for TLS only, document SSH as internal-only transport. This is essentially what Phase 1 achieves, but framed as the complete solution rather than a stepping stone. May not satisfy strict auditors.

  3. No FIPS support -- Require FIPS-mode clusters to use custom crypto policy exceptions for OpenShell pods. Not viable for enterprise adoption in regulated environments.

  4. gVisor RuntimeClass -- gVisor provides its own syscall interception and could theoretically handle crypto at the runtime level. Not applicable -- gVisor intercepts syscalls, not userspace crypto library calls.

Agent Investigation

Investigation performed with a coding agent pointed at the repo. Skills loaded: create-spike, generate-sandbox-policy. The agent traced every crypto dependency, configuration point, and algorithm choice across the 15-crate workspace.

Crypto dependency map

The TLS and SSH subsystems use different crypto backends -- a critical finding for the migration path:

TLS path (all connections):
  rustls 0.23.37 -> ring 0.17.14
  rcgen 0.13.2 -> ring 0.17.14
  rustls-webpki 0.103.10 -> ring 0.17.14
  quinn-proto 0.11.14 -> ring 0.17.14

SSH path (sandbox transport):
  russh 0.57.1 -> aws-lc-rs 1.16.2
  russh 0.57.1 -> ed25519-dalek 2.2.0 -> curve25519-dalek 4.1.3
  russh 0.57.1 -> aes 0.8.4, cbc 0.1.2, ctr 0.9.2 (RustCrypto symmetric)
  russh 0.57.1 -> p256 0.13.2, p384 0.13.1, p521 0.13.3
  russh 0.57.1 -> libcrux-ml-kem 0.0.4 (post-quantum)

Code references

Location Description
Cargo.toml:36-37 Workspace rustls/tokio-rustls pinned to ring feature
Cargo.toml:39 rcgen 0.13 with crypto feature (ring backend)
Cargo.toml:70 reqwest with rustls-tls feature
Cargo.toml:73 tokio-tungstenite with rustls-tls-native-roots
Cargo.toml:93 sqlx with runtime-tokio-rustls
crates/openshell-server/src/cli.rs:184 ring::default_provider().install_default()
crates/openshell-cli/src/main.rs:1664 ring::default_provider().install_default()
crates/openshell-sandbox/src/main.rs:122 ring::default_provider().install_default()
crates/openshell-server/src/tls.rs:63 Gateway mTLS ServerConfig (no cipher suite customization)
crates/openshell-cli/src/tls.rs:209 CLI mTLS ClientConfig (no cipher suite customization)
crates/openshell-sandbox/src/l7/tls.rs:156 MITM proxy ServerConfig
crates/openshell-sandbox/src/l7/tls.rs:222 MITM proxy upstream ClientConfig
crates/openshell-sandbox/src/l7/tls.rs:44 MITM ephemeral CA generation (rcgen/ring)
crates/openshell-sandbox/src/l7/tls.rs:116 MITM leaf cert generation (rcgen/ring)
crates/openshell-sandbox/src/ssh.rs:56 PrivateKey::random(&mut rng, Algorithm::Ed25519) -- hardcoded Ed25519 host key
crates/openshell-sandbox/src/ssh.rs:58 russh::server::Config::default() -- no algorithm restrictions
crates/openshell-sandbox/src/ssh.rs:322 NSSH1 HMAC-SHA256 via RustCrypto hmac + sha2
crates/openshell-server/src/ssh_tunnel.rs:320 Server-side NSSH1 HMAC-SHA256
crates/openshell-server/src/grpc/sandbox.rs:828 russh::client::Config::default() -- no algorithm restrictions
crates/openshell-bootstrap/src/pki.rs:40,60,78 PKI key generation via rcgen::KeyPair::generate() (ECDSA P-256 default via ring)
deploy/docker/Dockerfile.images Base image: nvcr.io/nvidia/base/ubuntu:noble-20251013 (not UBI, no FIPS OpenSSL)

Non-FIPS algorithm inventory

Operation Current Algorithm FIPS? FIPS Alternative Controlling Code
TLS 1.3 cipher ChaCha20-Poly1305 (in default list) No AES-256-GCM, AES-128-GCM CryptoProvider cipher suite list
TLS key exchange X25519 (in default list) No ECDH-P256, ECDH-P384 CryptoProvider kx_group list
TLS crypto module ring 0.17 No aws-lc-rs (CMVP #4631) Cargo.toml feature flags
SSH host key Ed25519 (hardcoded) No ECDSA-P256, ECDSA-P384 ssh.rs:56
SSH key exchange curve25519-sha256 (default preferred) No ecdh-sha2-nistp256/384 russh::Config::preferred
SSH cipher chacha20-poly1305 (default preferred) No aes256-gcm, aes128-gcm russh::Config::preferred
SSH KEX (PQ) mlkem768x25519 No Remove from preference list russh::Config::preferred
SSH crypto module ed25519-dalek, RustCrypto AES, p256 No aws-lc-fips-sys (requires upstream russh changes) russh internals
PKI key generation ECDSA P-256 via ring Algorithm OK, module not validated ECDSA P-256 via aws-lc-rs rcgen backend feature
NSSH1 HMAC HMAC-SHA256 via RustCrypto hmac+sha2 Algorithm OK, module not validated HMAC-SHA256 via aws-lc-rs ssh.rs:322, ssh_tunnel.rs:320

Existing FIPS awareness: Zero. The only mention of "FIPS" in the codebase is in an OCSF schema JSON referencing NIST FIPS 199 (information classification standard, unrelated to crypto).

Feature flag patterns: The codebase already uses workspace-level feature propagation (bundled-z3, dev-settings) and platform-conditional compilation via #[cfg(target_os = "linux")]. A #[cfg(feature = "fips")] pattern would be consistent.

Risks & open questions:

  1. aws-lc-rs FIPS build requires CMake + Go, adding build toolchain complexity
  2. russh's internal crypto (ed25519-dalek, p256, RustCrypto AES) is not FIPS-validated regardless of algorithm selection -- Phase 1 documents this gap
  3. Does russh have upstream plans for an aws-lc-rs or FIPS backend?
  4. Cross-compilation from macOS to linux/amd64 for FIPS container builds may require remote builds
  5. SSH host key change from Ed25519 to ECDSA-P256 changes fingerprint -- sandboxes are ephemeral, so should not cause persistent trust issues
  6. Verify aws-lc-rs 1.16.2 references a validated AWS-LC build matching CMVP #4631
  7. Transitive deps (reqwest, sqlx, tokio-tungstenite, hyper-rustls) each need verification with aws-lc-rs provider
  8. Single fips feature flag vs separate fips-tls/fips-ssh for phased rollout?

Checklist

  • I've reviewed existing issues and the architecture docs
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions