Skip to content

Protobuf schema for NormalizedEvent on NATS/ClickHouse pipeline #12

@haasonsaas

Description

@haasonsaas

Summary

Add protobuf schema for the NormalizedEvent / TapEventData types that flow through the NATS JetStream -> ClickHouse pipeline. This is ensemble-tap's highest-throughput data path and currently uses untyped map[string]any JSON.

Motivation

ensemble-tap ingests webhooks from 20+ SaaS providers (Stripe, HubSpot, Slack, etc.), normalizes them into a NormalizedEvent struct, and publishes to NATS JetStream. A ClickHouse consumer writes batches of 500 events every 2 seconds.

Current problems:

  • The changes and snapshot fields are serialized as JSON strings into ClickHouse String columns -- no schema, no type safety
  • TapEventData uses map[string]any for the event payload -- consumers must guess the shape
  • CloudEvent data field is json.RawMessage with no contract
  • Adding a new provider field means hoping all consumers handle it correctly

Scope

  1. Define proto/tap/v1/event.proto with NormalizedEvent, TapEventData, and provider-specific payload types
  2. Publish to NATS as serialized protobuf instead of JSON
  3. Update ClickHouse consumer to use protobuf input format for batch inserts
  4. Keep CloudEvents envelope as JSON (it's the transport), but use protobuf for the data payload
  5. Provider-specific payload types can use google.protobuf.Struct initially, migrating to typed messages per provider over time

Expected Benefits

  • Schema enforcement at the NATS publish boundary
  • Smaller wire format for high-throughput batch writes
  • ClickHouse can use Protobuf input format, eliminating JSON parse on ingest
  • Type-safe deserialization for any downstream consumer of the tap event stream
  • Foundation for adding new providers with schema validation

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions