L7 proxy placeholder-token rewriting doesn't cover WSS payloads — Discord gateway fails with 4004

### Agent Diagnostic

[agent-diagnostic-output.txt](https://github.com/user-attachments/files/26948494/agent-diagnostic-output.txt)

### Description

### Environment
- OpenShell: 0.0.26
- NemoClaw: 2026.4.2 (stack consumer, using blueprint `policies/presets/discord.yaml`)
- Host: DGX Spark (ARM64), Kubernetes-deployed OpenShell gateway

### Summary
OpenShell's L7 proxy rewrites placeholder tokens (`openshell:resolve:env:*`) at egress for TLS-terminated REST traffic. For `gateway.discord.gg` the NemoClaw blueprint policy sets `tls: skip` (per #544, pass-through is required to keep long-lived WSS sessions working). Result: the placeholder flows unchanged inside the WSS IDENTIFY payload; Discord closes with opcode 4004 (auth failed); the bot never connects.

### Reproduce
1. `nemoclaw onboard --non-interactive` with a valid `DISCORD_BOT_TOKEN`
2. Provider `<sandbox>-discord-bridge` is created and attached to the sandbox; sandbox env has `DISCORD_BOT_TOKEN=openshell:resolve:env:DISCORD_BOT_TOKEN`
3. OpenClaw attempts to connect to `wss://gateway.discord.gg`
4. Gateway closes immediately with opcode 4004 (see attached `gateway.log`, search for `4004`)

### Confirmation it's a payload-rewrite gap, not a policy/network problem
Writing the real Discord bot token directly into `/sandbox/.openclaw/openclaw.json` (field `channels.discord.accounts.default.token`), bypassing the placeholder system for this field, produces a successful IDENTIFY and the bot connects. No policy changes required; the only variable is whether the literal placeholder string or the real token arrives in the WSS IDENTIFY payload.

### Proposed directions
1. Add WSS MITM + JSON-payload-aware rewriting for known channel protocols (Discord IDENTIFY op 2, `d.token` field), so `tls: skip` can be removed for `gateway.discord.gg`.
2. OR: expose an in-sandbox secret-resolution gRPC endpoint (e.g. reachable via `OPENSHELL_ENDPOINT`) that clients can call to resolve `openshell:resolve:env:*` explicitly. OpenClaw (and other consumers) could then resolve at config-read time instead of relying on egress rewriting.
3. OR: when a provider is attached to a sandbox and the target channel is known to use WSS, let OpenShell inject the real credential value into the child env var directly at sandbox start (documenting the security trade-off that the credential is then at-rest in the sandbox env rather than only in the provider store).

### Related
- #544 — merged PR that introduced the "auto-terminate TLS unconditionally" behavior; `tls: skip` is the escape hatch used by the NemoClaw Discord preset
- #894 — adjacent placeholder-model limitation (SDKs that validate token format pre-network)

### Attachments
- `gateway.log` — 344-line log from inside the sandbox (micky pod, `/tmp/gateway.log`). Shows the 4004 pattern before the manual workaround and the quiet "awaiting gateway readiness" (implicit READY) after.
- `openshell-status.txt` — `openshell status`
- `openshell-doctor-check.txt` — `openshell doctor check`
- `openshell-doctor-logs.txt` — `openshell doctor logs --lines 200`

[openshell-issue-bundle.zip](https://github.com/user-attachments/files/26948365/openshell-issue-bundle.zip)

### Reproduction Steps

1. Deploy a NemoClaw stack (v2026.4.2) with OpenShell 0.0.26 on ARM64 (DGX Spark, k3s).
   Blueprint uses policies/presets/discord.yaml which pins gateway.discord.gg to `tls: skip`
   (per NVIDIA/OpenShell#544 — required to keep long-lived WSS sessions alive).

2. Create a Discord bot credential:
     nemoclaw credentials set DISCORD_BOT_TOKEN <real-bot-token>

3. Onboard the sandbox (this creates provider `<sandbox>-discord-bridge` and attaches it):
     nemoclaw onboard --non-interactive

4. Confirm the sandbox env contains the placeholder, not the real token:
     kubectl exec -n nemoclaw deploy/micky -- printenv DISCORD_BOT_TOKEN
     # => openshell:resolve:env:DISCORD_BOT_TOKEN

5. Start OpenClaw inside the sandbox so it connects to Discord:
     kubectl exec -n nemoclaw deploy/micky -- openclaw start

6. Observe gateway.log — OpenClaw opens wss://gateway.discord.gg, sends IDENTIFY op 2
   with `d.token` set to the literal string "openshell:resolve:env:DISCORD_BOT_TOKEN",
   Discord closes the socket with opcode 4004 (Authentication Failed).

Expected: IDENTIFY carries the resolved bot token; gateway sends READY; bot comes online.
Actual:   IDENTIFY carries the literal placeholder string; gateway closes with 4004.

Workaround (confirms payload-rewrite gap, not a policy/network problem):
  Edit /sandbox/.openclaw/openclaw.json inside the pod, set
  channels.discord.accounts.default.token to the real token value from
  ~/.nemoclaw/credentials.json. OpenClaw hot-reloads, IDENTIFY now carries the real
  token, Discord sends READY, bot connects. No policy changes required.

### Environment

OpenShell:     0.0.26
NemoClaw:      2026.4.2 (blueprint: policies/presets/discord.yaml)
OpenClaw:      2026.4.9
Host:          NVIDIA DGX Spark (GB10 Grace Blackwell, ARM64 / aarch64)
OS:            Ubuntu 24.04 LTS (kernel 6.11, CUDA 13.0)
Runtime:       k3s (single-node), containerd
Sandbox pod:   micky (namespace: nemoclaw)
Storage:       local-path PVC `workspace-micky` (2Gi) mounted at /sandbox
Client:        Discord gateway via `ws` npm library (raw tls.connect, ignores HTTPS_PROXY)
Policy:        gateway.discord.gg → `tls: skip` (L4 CONNECT pass-through)
Network:       OpenShell L7 proxy at 10.200.0.1:3128 (CONNECT + TLS-MITM for REST egress)

### Logs

```shell

```

### Agent-First Checklist

- [x] I pointed my agent at the repo and had it investigate this issue
- [x] I loaded relevant skills (e.g., `debug-openshell-cluster`, `debug-inference`, `openshell-cli`)
- [x] My agent could not resolve this — the diagnostic above explains why

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

L7 proxy placeholder-token rewriting doesn't cover WSS payloads — Discord gateway fails with 4004 #913

Agent Diagnostic

Description

Environment

Summary

Reproduce

Confirmation it's a payload-rewrite gap, not a policy/network problem

Proposed directions

Related

Attachments

Reproduction Steps

=> openshell:resolve:env:DISCORD_BOT_TOKEN

Environment

Logs

Agent-First Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

L7 proxy placeholder-token rewriting doesn't cover WSS payloads — Discord gateway fails with 4004 #913

Description

Agent Diagnostic

Description

Environment

Summary

Reproduce

Confirmation it's a payload-rewrite gap, not a policy/network problem

Proposed directions

Related

Attachments

Reproduction Steps

=> openshell:resolve:env:DISCORD_BOT_TOKEN

Environment

Logs

Agent-First Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions