Skip to content

[python] Fix read_entries_parallel to use order-dependent merge via FileEntry.merge_entries#7596

Open
plusplusjiajia wants to merge 1 commit intoapache:masterfrom
plusplusjiajia:fix-manifest
Open

[python] Fix read_entries_parallel to use order-dependent merge via FileEntry.merge_entries#7596
plusplusjiajia wants to merge 1 commit intoapache:masterfrom
plusplusjiajia:fix-manifest

Conversation

@plusplusjiajia
Copy link
Copy Markdown
Member

Purpose

Fix incorrect manifest entry merge logic in read_entries_parallel() that has existed since #6451.

The old implementation collected all DELETE identifiers into a global set, then filtered out all matching ADDs. This caused two bugs:

  1. Compaction file loss: When a file goes through ADD → DELETE → re-ADD (same identifier), the global DELETE set filters out both ADDs, causing the re-added file to disappear from the result.
  2. Unmatched DELETE silently dropped: DELETE entries without a matching ADD were never included in the output, losing deletion information for downstream consumers (e.g., manifest compaction rewrite,
    changelog reads).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant