Skip to content

Add commit retries to insert table writes#250

Merged
bradhe merged 2 commits intodevelopfrom
features/add-commit-retries-to-insert-table-writes
Apr 16, 2026
Merged

Add commit retries to insert table writes#250
bradhe merged 2 commits intodevelopfrom
features/add-commit-retries-to-insert-table-writes

Conversation

@bradhe
Copy link
Copy Markdown
Contributor

@bradhe bradhe commented Apr 16, 2026

  • Adds CommitFailedException retry logic with metadata refresh to insert() and delete() in Table, matching the existing pattern in upsert(). This resolves failures when parallel apps write to the same Iceberg table concurrently.
  • Adds max_retries and retry_delay_seconds parameters to both methods (defaulting to 5 retries / 0.5s delay, same as upsert()).
  • Adds concurrent write tests for both insert() and delete() to verify retry behavior under contention.

Context

A user reported CommitFailedException: branch main has changed errors when multiple parallel apps insert into the same Iceberg table. The retry-with-refresh pattern already existed on upsert() but was missing from insert() and delete().

Summary by CodeRabbit

Release Notes

  • New Features

    • Insert and delete operations now accept optional max_retries and retry_delay_seconds parameters for improved resilience during concurrent writes.
  • Chores

    • Updated Iceberg optional dependencies (Polars, PyArrow, PyIceberg) to use flexible version constraints instead of pinned versions.
    • Enhanced URL scheme normalization for local and remote host configurations.

Parallel writes to the same Iceberg table can fail with
CommitFailedException when the branch changes during active writes.
This adds the same retry-with-refresh logic that upsert already has
to both insert() and delete(), resolving concurrent write conflicts
automatically.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 55427e39-2abf-4dba-aebb-15b4e9fbe779

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR introduces retry mechanisms with configurable parameters to table insert and delete operations, normalizes session URLs based on host locality for scheme switching, updates Python dependency constraints for Iceberg-related packages, and adds concurrent write tests to verify retry behavior.

Changes

Cohort / File(s) Summary
URL Normalization
crates/tower-cmd/src/session.rs
Modified finalize_session to normalize session.tower_url by detecting local hosts (localhost, 127.0.0.1, ::1) and conditionally upgrading HTTP schemes to HTTPS for non-local URLs.
Dependency Version Constraints
pyproject.toml
Updated optional iceberg dependencies from pinned versions to minimum-version constraints: polars (1.27.1 → ≥1.39.3), pyarrow (19.0.1 → ≥23.0.1), pyiceberg (0.9.1 → ≥0.11.1); also updated dev dependency pyiceberg[sql-sqlite] (0.9.1 → ≥0.11.1).
Table Operations Retry Logic
src/tower/_tables.py
Added max_retries and retry_delay_seconds parameters to insert() and delete() methods with automatic metadata refresh and retry behavior on CommitFailedException; enforced reader_override="pyiceberg" in Polars Iceberg scanning for both eager and lazy operations.
Concurrent Write Tests
tests/tower/test_tables.py
Added test_insert_concurrent_writes_with_retry and test_delete_concurrent_writes_with_retry to validate retry mechanics under concurrent conditions; updated test_nested_structs schema field type and test_map_type_simple null-handling assertion.

Poem

🐰 Hops of joy for retries new,
Schemes that flip for hosts untrue,
Polars bouncing, versions loose,
Concurrent writes—no more caboose!
Tower stands so tall and bright,
Ready for the retry fight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.42% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add commit retries to insert table writes' directly corresponds to the main PR objective and the primary change in src/tower/_tables.py: adding retry logic with max_retries and retry_delay_seconds parameters to the insert() method.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch features/add-commit-retries-to-insert-table-writes

Comment @coderabbitai help to get the list of available commands and usage tips.

@bradhe bradhe changed the base branch from main to develop April 16, 2026 11:07
@tower tower deleted a comment from github-actions Bot Apr 16, 2026
@tower tower deleted a comment from github-actions Bot Apr 16, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds commit-conflict retry behavior to Iceberg table writes to reduce CommitFailedException failures under concurrent writers, aligning insert() and delete() with the existing upsert() approach.

Changes:

  • Add CommitFailedException retry loops with metadata refresh to Table.insert() and Table.delete().
  • Expose max_retries / retry_delay_seconds parameters on both methods (defaults: 5 / 0.5s).
  • Add concurrent-write tests for insert() and delete() retry behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
src/tower/_tables.py Implements refresh+retry loops for insert() and delete() on commit conflicts.
tests/tower/test_tables.py Adds concurrent insert/delete tests intended to validate the new retry behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/tower/test_tables.py
Comment thread tests/tower/test_tables.py Outdated
Comment thread src/tower/_tables.py
Comment thread src/tower/_tables.py
Comment thread src/tower/_tables.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/tower/_tables.py (1)

156-213: ⚠️ Potential issue | 🟠 Major

Validate retry arguments before entering these loops.

With max_retries < 0, Line 199 / Line 353 never executes the body and Line 213 / Line 374 ends up doing raise None, which surfaces as a TypeError instead of a clear API error. A negative retry_delay_seconds also leaks a ValueError from sleep() only after the first conflict. Reject both values up front; ideally do it once in a shared retry helper so insert(), delete(), and upsert() stay consistent.

Proposed fix
+    `@staticmethod`
+    def _validate_retry_args(max_retries: int, retry_delay_seconds: float) -> None:
+        if max_retries < 0:
+            raise ValueError("max_retries must be >= 0")
+        if retry_delay_seconds < 0:
+            raise ValueError("retry_delay_seconds must be >= 0")
+
     def insert(
         self,
         data: pa.Table,
         max_retries: int = 5,
         retry_delay_seconds: float = 0.5,
     ) -> TTable:
+        self._validate_retry_args(max_retries, retry_delay_seconds)
         last_exception = None
         ...
 
     def delete(
         self,
         filters: Union[str, List[pc.Expression]],
         max_retries: int = 5,
         retry_delay_seconds: float = 0.5,
     ) -> TTable:
+        self._validate_retry_args(max_retries, retry_delay_seconds)
         ...

Also applies to: 297-374

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tower/_tables.py` around lines 156 - 213, Validate retry arguments at the
start of the retrying methods (insert, delete, upsert) before entering their
retry loops: check that max_retries is >= 0 and retry_delay_seconds is >= 0 and
raise a clear ValueError (or a custom API error) if not. Update the insert
method (and the corresponding delete/upsert implementations) to perform this
validation at the top (before the for attempt in range(...) loop) so you never
end up raising None or letting time.sleep receive a negative delay; optionally
factor this validation into a shared retry helper used by insert/delete/upsert
to keep behavior consistent.
🧹 Nitpick comments (1)
crates/tower-cmd/src/session.rs (1)

135-140: Centralize tower URL normalization to keep session persistence consistent.

This logic currently lives only in finalize_session, while crates/config/src/session.rs (snippet Lines 303-318) sets session.tower_url directly. Consider extracting a shared normalizer and reusing it in both paths so persisted session.tower_url behavior doesn’t diverge.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/tower-cmd/src/session.rs` around lines 135 - 140, The URL
normalization logic that adjusts http->https for non-local hosts should be
centralized into a reusable helper (e.g., a function named normalize_tower_url
or normalize_and_set_tower_url that accepts/returns a url or &mut Url); move the
current code from finalize_session into that helper and replace the inline block
in finalize_session (which mutates session.tower_url) and the direct assignment
in crates/config/src/session.rs (where session.tower_url is currently set) to
call this helper so both code paths use the same normalization: detect local
hosts ("localhost", "127.0.0.1", "::1") and only change scheme from "http" to
"https" when not local.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/tower/test_tables.py`:
- Around line 324-381: Make the retry-path deterministic by forcing the first
commit to fail and verifying refresh was invoked before retry: in
test_insert_concurrent_writes_with_retry wrap or monkeypatch the underlying
commit/append method used by tower.tables(...)._table (e.g., the internal
commit/append method your implementation calls) so it raises
CommitFailedException on the first call and succeeds thereafter, incrementing
retry_count inside your tracked_refresh; ensure insert_ticker installs that
failing stub before calling t.insert and then restores the real method after the
simulated failure so the retry can succeed, and assert that t._table.refresh
(the tracked_refresh) was called before the retry completes; apply the same
pattern to the other test referenced (lines 384-441) to eliminate flakiness.

---

Outside diff comments:
In `@src/tower/_tables.py`:
- Around line 156-213: Validate retry arguments at the start of the retrying
methods (insert, delete, upsert) before entering their retry loops: check that
max_retries is >= 0 and retry_delay_seconds is >= 0 and raise a clear ValueError
(or a custom API error) if not. Update the insert method (and the corresponding
delete/upsert implementations) to perform this validation at the top (before the
for attempt in range(...) loop) so you never end up raising None or letting
time.sleep receive a negative delay; optionally factor this validation into a
shared retry helper used by insert/delete/upsert to keep behavior consistent.

---

Nitpick comments:
In `@crates/tower-cmd/src/session.rs`:
- Around line 135-140: The URL normalization logic that adjusts http->https for
non-local hosts should be centralized into a reusable helper (e.g., a function
named normalize_tower_url or normalize_and_set_tower_url that accepts/returns a
url or &mut Url); move the current code from finalize_session into that helper
and replace the inline block in finalize_session (which mutates
session.tower_url) and the direct assignment in crates/config/src/session.rs
(where session.tower_url is currently set) to call this helper so both code
paths use the same normalization: detect local hosts ("localhost", "127.0.0.1",
"::1") and only change scheme from "http" to "https" when not local.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b8095b3b-c087-4f5e-a86e-4dc9b8b082bb

📥 Commits

Reviewing files that changed from the base of the PR and between 824944b and 2c267a8.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • crates/tower-cmd/src/session.rs
  • pyproject.toml
  • src/tower/_tables.py
  • tests/tower/test_tables.py

Comment thread tests/tower/test_tables.py
Copy link
Copy Markdown
Contributor

@konstantinoscs konstantinoscs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@socksy socksy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the test deterministic before merging though

…istic

Add input validation for max_retries and retry_delay_seconds across
insert(), upsert(), and delete() to prevent confusing errors from
negative values. Make concurrent insert/delete tests deterministic by
forcing CommitFailedException on the first attempt rather than relying
on real concurrency conflicts.
@bradhe bradhe merged commit f078f3a into develop Apr 16, 2026
30 checks passed
@bradhe bradhe deleted the features/add-commit-retries-to-insert-table-writes branch April 16, 2026 14:20
@coderabbitai coderabbitai Bot mentioned this pull request Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants