Summary
Add protobuf schema for the NormalizedEvent / TapEventData types that flow through the NATS JetStream -> ClickHouse pipeline. This is ensemble-tap's highest-throughput data path and currently uses untyped map[string]any JSON.
Motivation
ensemble-tap ingests webhooks from 20+ SaaS providers (Stripe, HubSpot, Slack, etc.), normalizes them into a NormalizedEvent struct, and publishes to NATS JetStream. A ClickHouse consumer writes batches of 500 events every 2 seconds.
Current problems:
- The
changes and snapshot fields are serialized as JSON strings into ClickHouse String columns -- no schema, no type safety
TapEventData uses map[string]any for the event payload -- consumers must guess the shape
- CloudEvent
data field is json.RawMessage with no contract
- Adding a new provider field means hoping all consumers handle it correctly
Scope
- Define
proto/tap/v1/event.proto with NormalizedEvent, TapEventData, and provider-specific payload types
- Publish to NATS as serialized protobuf instead of JSON
- Update ClickHouse consumer to use protobuf input format for batch inserts
- Keep CloudEvents envelope as JSON (it's the transport), but use protobuf for the
data payload
- Provider-specific payload types can use
google.protobuf.Struct initially, migrating to typed messages per provider over time
Expected Benefits
- Schema enforcement at the NATS publish boundary
- Smaller wire format for high-throughput batch writes
- ClickHouse can use
Protobuf input format, eliminating JSON parse on ingest
- Type-safe deserialization for any downstream consumer of the tap event stream
- Foundation for adding new providers with schema validation
Summary
Add protobuf schema for the
NormalizedEvent/TapEventDatatypes that flow through the NATS JetStream -> ClickHouse pipeline. This is ensemble-tap's highest-throughput data path and currently uses untypedmap[string]anyJSON.Motivation
ensemble-tap ingests webhooks from 20+ SaaS providers (Stripe, HubSpot, Slack, etc.), normalizes them into a
NormalizedEventstruct, and publishes to NATS JetStream. A ClickHouse consumer writes batches of 500 events every 2 seconds.Current problems:
changesandsnapshotfields are serialized as JSON strings into ClickHouseStringcolumns -- no schema, no type safetyTapEventDatausesmap[string]anyfor the event payload -- consumers must guess the shapedatafield isjson.RawMessagewith no contractScope
proto/tap/v1/event.protowithNormalizedEvent,TapEventData, and provider-specific payload typesdatapayloadgoogle.protobuf.Structinitially, migrating to typed messages per provider over timeExpected Benefits
Protobufinput format, eliminating JSON parse on ingest