Skip to content

Cannot read encrypted Parquet file if page index reading is enabled #7629

@adamreeve

Description

@adamreeve

Describe the bug

Trying to read a Parquet file that uses modular encryption when page indices are enabled in the ArrowReaderOptions results in an error like:

ArrowError("Parquet argument error: External: bad data")

To Reproduce

This test reproduces the issue when added to parquet/tests/encryption/encryption_async.rs:

#[tokio::test]
async fn test_read_with_page_index() {
    let test_data = arrow::util::test_util::parquet_test_data();
    let path = format!("{test_data}/uniform_encryption.parquet.encrypted");
    let mut file = File::open(&path).await.unwrap();

    let key_code: &[u8] = "0123456789012345".as_bytes();
    let decryption_properties = FileDecryptionProperties::builder(key_code.to_vec())
        .build()
        .unwrap();

    let options = ArrowReaderOptions::new()
        .with_file_decryption_properties(decryption_properties)
        .with_page_index(true);

    let arrow_metadata = ArrowReaderMetadata::load_async(&mut file, options)
        .await
        .unwrap();

    let record_reader = ParquetRecordBatchStreamBuilder::new_with_metadata(
        file,
        arrow_metadata,
    )
    .build()
    .unwrap();
    let _record_batches = record_reader.try_collect::<Vec<_>>().await.unwrap();
}

Expected behavior
Data should be read successfully, and give the same results as when with_page_index(false) is used.

Additional context

This was encountered by @corwinjoy when integrating encryption support in DataFusion. Page indexes are enabled when data is queried with a filter predicate.

Metadata

Metadata

Assignees

Labels

bugparquetChanges to the parquet crate

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions