Skip to content

fix: return error instead of panic on schema mismatch in BatchCoalescer::push_batch#9390

Merged
alamb merged 1 commit intoapache:mainfrom
bvolpato-dd:fix-coalesce-schema-mismatch-panic
Feb 11, 2026
Merged

fix: return error instead of panic on schema mismatch in BatchCoalescer::push_batch#9390
alamb merged 1 commit intoapache:mainfrom
bvolpato-dd:fix-coalesce-schema-mismatch-panic

Conversation

@bvolpato-dd
Copy link
Copy Markdown
Contributor

@bvolpato-dd bvolpato-dd commented Feb 11, 2026

Which issue does this PR close?

Rationale for this change

BatchCoalescer::push_batch currently uses a hard assert_eq! to check that the incoming batch has the same number of columns as the coalescer's schema. If there's a mismatch, the whole process panics.

The function already returns Result<(), ArrowError>, so there's no reason this can't be an error return instead. This is also how the rest of the arrow API handles the same situation — RecordBatch::try_new returns Err(ArrowError::InvalidArgumentError) for column count mismatches, and other checks in the same struct use debug_assert!.

We ran into this in production where a connector returned batches with a different schema than the plan expected. Instead of a query-level error, the whole process went down.

What changes are included in this PR?

  • Replace assert_eq!(arrays.len(), self.in_progress_arrays.len()) with an if check that returns Err(ArrowError::InvalidArgumentError(...)).
  • Add three tests covering both directions of mismatch (fewer columns, more columns, zero-vs-two).

Are these changes tested?

Yes — three new tests:

  • test_push_batch_schema_mismatch_fewer_columns — coalescer expects 0 columns, batch has 1
  • test_push_batch_schema_mismatch_more_columns — coalescer expects 2 columns, batch has 1
  • test_push_batch_schema_mismatch_two_vs_zero — coalescer expects 0 columns, batch has 2 (matches the exact error we saw in production)

Are there any user-facing changes?

BatchCoalescer::push_batch now returns Err(ArrowError::InvalidArgumentError) on column count mismatch instead of panicking. Any caller that was relying on the panic (unlikely) would need to handle the error instead.

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Feb 11, 2026
…er::push_batch

Replace assert_eq! with an error return when the batch column count
does not match the coalescer schema. The function already returns
Result<(), ArrowError>, so callers can handle this gracefully.
@bvolpato-dd bvolpato-dd force-pushed the fix-coalesce-schema-mismatch-panic branch from e65cece to 467ad58 Compare February 11, 2026 05:00
@bvolpato-dd bvolpato-dd marked this pull request as ready for review February 11, 2026 05:00
Copy link
Copy Markdown
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good idea to remove these panics in favour of errors 👍

One query I have is that push_batch() seems to have some fast paths/early returns before this check, so is it possible that existing behaviour sometimes was to ignore this potential column miscount 🤔

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Feb 11, 2026

One query I have is that push_batch() seems to have some fast paths/early returns before this check, so is it possible that existing behaviour sometimes was to ignore this potential column miscount 🤔

It's possible, but not intended -- push_batch should be pushing the same columns each time

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank ou @bvolpato-dd and @Jefffrey -- looks good to me too

@alamb alamb merged commit bf3520d into apache:main Feb 11, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BatchCoalescer::push_batch panics on schema mismatch instead of returning error

3 participants