Skip to content

Configurable Arrow representation of UTC timestamps for Avro reader #9279

@mzabaluev

Description

@mzabaluev

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The logical Avro types to represent an UTC timestamp, e.g. "timestamp-millis", "timestamp-micros", etc., are represented with Arrow datatypes specifying the UTC offset ID. Currently the "+00:00" explicit offset is hardcoded to represent the timestamp types, so e.g. "timestamp-micros" is mapped to DataType::Timestamp(TimeUnit::Microseconds, Some("+00:00")). This may cause interoperability problems with applications using the "UTC" timezone ID, because unfortunately Arrow datatypes using these IDs are not equal.

Describe the solution you'd like

Make the timezone ID used to map Avro timestamps configurable in the ReaderBuilder API.

Describe alternatives you've considered

Persuade arrow-rs developers to:

  • change the representation of time zones to remove the ambiguity;
  • make the timestamp datatypes compare equal if they only differ in "UTC" vs. "+00:00" as the timezone ID.

These changes would break either the API or the behavior of arrow-rs in general and so would take a lot of time and coordination to implement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow cratearrow-avroarrow-avro crateenhancementAny new improvement worthy of a entry in the changelog

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions