Skip to content

feat: add request-level HTTP/gRPC tracing to openshell-server #892

@sjenning

Description

@sjenning

Problem Statement

The openshell-server has no systematic request-level logging. Individual handlers have scattered info!()/warn!() calls, but there is no unified layer that logs every request with method, path, status, and latency. This makes it difficult to diagnose issues, monitor performance, and audit API usage. The tower-http crate with the trace feature is already a dependency but unused.

Proposed Design

Apply tower_http::trace::TraceLayer to the gRPC and HTTP inner services in multiplex.rs before they are passed to MultiplexedService. This is a single-file change — no handler modifications needed.

Why wrap the inner services, not MultiplexedService? MultiplexedService implements hyper::service::Service (not tower::Service), so Tower layers can't wrap it without an adapter. The inner GrpcRouter and Axum Router are both tower::Service, making them natural wrapping targets.

Implementation sketch

let trace_layer = TraceLayer::new_for_http()
    .make_span_with(|req: &Request<_>| {
        tracing::info_span!("request", method = %req.method(), path = %req.uri().path())
    })
    .on_response(|res: &Response<_>, latency: Duration, _span: &Span| {
        tracing::info!(status = res.status().as_u16(), latency_ms = latency.as_millis(), "response");
    });

let grpc_service = ServiceBuilder::new().layer(trace_layer.clone()).service(grpc_service);
let http_service = ServiceBuilder::new().layer(trace_layer).service(http_service);

Expected log output

INFO request{method=GET path=/readyz}: response status=200 latency_ms=1
INFO request{method=POST path=/openshell.v1.OpenShell/CreateSandbox}: response status=200 latency_ms=45

Scope boundaries

  • gRPC status codes: gRPC errors return HTTP 200 with status in trailers; on_response logs the HTTP status only. Trailer inspection would require a custom layer — out of scope.
  • Streaming RPCs: Latency reflects time-to-first-response-byte, not stream duration.
  • Health checks: /health, /healthz, /readyz will be logged at INFO. Can be downgraded to DEBUG via path filtering in a follow-up.
  • WebSocket/SSH upgrades: Initial HTTP handshake (101) is logged; subsequent frames are not.

Alternatives Considered

  1. Wrap MultiplexedService directly — would require a hyper::Servicetower::Service adapter, adding complexity for no benefit since the inner services are already Tower-compatible.
  2. Per-handler logging — scattered info!() calls in each handler. Inconsistent, easy to miss new endpoints, and duplicates effort.
  3. Middleware on the Axum router only — would miss gRPC traffic entirely.

Agent Investigation

Explored crates/openshell-server/src/multiplex.rs and confirmed:

  • tower-http with trace feature is an existing dependency (unused).
  • MultiplexedService implements hyper::service::Service, not tower::Service.
  • Inner GrpcRouter and Axum Router are tower::Service and accept layering.
  • TraceLayer wraps responses in ResponseBody<B> which preserves Body<Data = Bytes>, so generic bounds on MultiplexedService remain satisfied.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions