This is a follow up on #7878
The variant spec states the string values in the metadata dictionary must be UTF-8 encoded strings.
We do this check here:
|
// Verify the string values in the dictionary are UTF-8 encoded strings. |
|
let value_buffer = |
|
string_from_slice(self.bytes, 0, self.first_value_byte as _..self.bytes.len())?; |
Since we offer simdutf8 as an optional dependency in other crates, we could do the same when performing the validation above. See @Dandandan's comment.
The rough idea being:
If simdutf8 is supported, do:
let value_str = simdutf8::basic::from_utf8(value_buffer)?;
else, default to the existing implementation
This is a follow up on #7878
The variant spec states the string values in the metadata dictionary must be UTF-8 encoded strings.
We do this check here:
arrow-rs/parquet-variant/src/variant/metadata.rs
Lines 250 to 252 in 387490a
Since we offer
simdutf8as an optional dependency in other crates, we could do the same when performing the validation above. See @Dandandan's comment.The rough idea being:
If
simdutf8is supported, do:else, default to the existing implementation