Skip to content

assume_timezone can only handle IANA timezone names, not fixed offsets #36558

@sm-Fifteen

Description

@sm-Fifteen

Describe the bug, including details regarding any error messages, version, and platform.

See pola-rs/polars#9586 (comment), where the issue was identified.

import pyarrow as pa
import pyarrow.compute as pc

from datetime import datetime

pc.assume_timezone(pa.array([datetime(2020, 1, 1)]), '+00:00')

ArrowInvalid: Cannot locate timezone '+00:00': +00:00 not found in timezone database

The Arrow format specification describes 3 timestamp formats: "naive date-time" (timezone string is null), "zoned date-time" (timezone string is from tzdb) and "offset date-time" (timezone string is a fixed RFC 3339 num-offset, so no Z). The doc for assume_timezone makes no specific mention of it, but it cannot handle being passed an offset because it only performs a lookup in tzdb (via LocateZone) and doesn't try to parse the offset. That's despite documentation in other places saying that offsets are completely fine and that "+00:00" should be considered as identical to "UTC". Given all this, I would expect "+00:00" to be properly recognized as UTC, and "+01:00" to be recognized as a fixed offset.

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions