Is there an existing issue for the same feature request?
Is your feature request related to a problem?
Currently data branch diff has limitations that make it less flexible than Git's diff:
- Only works with branched tables: Requires LCA (Lowest Common Ancestor) relationship, cannot compare arbitrary tables
- Limited output modes: Only supports full output and
output count, lacks incremental change tracking
- No detailed statistics:
output count only returns total rows, not breakdown of changes
Describe the feature you'd like
1. Fallback to EXCEPT for non-branched tables
Current behavior:
-- Only works if t2 was created from t1
data branch create table t2 from t1{snapshot="sp1"};
data branch diff t2 against t1; -- ✅ Works (has LCA)
-- Fails if no branch relationship
data branch diff t3 against t4; -- ❌ Error: no LCA
Proposed behavior:
-- Automatically fallback to EXCEPT-like behavior
data branch diff t3 against t4; -- ✅ Works (uses set difference)
2. Delta output mode
Proposed syntax:
data branch diff t1 against t2 output delta;
Expected result:
operation | a | b
----------|---|---
+ | 3 | 3 -- Added in t1
- | 4 | 4 -- Deleted from t1 (exists in t2)
Similar to Git's +/- notation for added/removed lines.
3. Summary statistics output
Proposed syntax:
data branch diff t1 against t2 output summary;
Expected result:
metric | count
----------|------
added | 10
deleted | 5
modified | 3
unchanged | 100
Provides quick overview without scanning all rows.
Describe implementation you've considered
Fallback mechanism
- Check if tables have LCA relationship
- If no LCA: use EXCEPT-based comparison
- Maintain same output format for consistency
Delta output
- Add
operation column: + (added), - (deleted), ~ (modified)
- Use primary key to match rows for modification detection
- If no PK: treat as add/delete only
Summary statistics
- Leverage existing diff computation
- Aggregate results into statistics
- Minimal performance overhead (single pass)
Documentation, Adoption, Use Case, Migration Strategy
Use Cases
- Data validation: Compare staging vs production tables without branch relationship
- Change tracking: Quickly see what changed between snapshots
- Monitoring: Track data drift with summary statistics
- Debugging: Identify specific rows added/deleted with delta output
Adoption
- Backward compatible: existing queries work unchanged
- New output modes are opt-in via
output clause
- Fallback behavior is transparent to users
Is there an existing issue for the same feature request?
Is your feature request related to a problem?
Currently
data branch diffhas limitations that make it less flexible than Git's diff:output count, lacks incremental change trackingoutput countonly returns total rows, not breakdown of changesDescribe the feature you'd like
1. Fallback to EXCEPT for non-branched tables
Current behavior:
Proposed behavior:
2. Delta output mode
Proposed syntax:
Expected result:
Similar to Git's
+/-notation for added/removed lines.3. Summary statistics output
Proposed syntax:
Expected result:
Provides quick overview without scanning all rows.
Describe implementation you've considered
Fallback mechanism
Delta output
operationcolumn:+(added),-(deleted),~(modified)Summary statistics
Documentation, Adoption, Use Case, Migration Strategy
Use Cases
Adoption
outputclause