Describe the bug, including details regarding any error messages, version, and platform.
See pola-rs/polars#9586 (comment), where the issue was identified.
import pyarrow as pa
import pyarrow.compute as pc
from datetime import datetime
pc.assume_timezone(pa.array([datetime(2020, 1, 1)]), '+00:00')
ArrowInvalid: Cannot locate timezone '+00:00': +00:00 not found in timezone database
The Arrow format specification describes 3 timestamp formats: "naive date-time" (timezone string is null), "zoned date-time" (timezone string is from tzdb) and "offset date-time" (timezone string is a fixed RFC 3339 num-offset, so no Z). The doc for assume_timezone makes no specific mention of it, but it cannot handle being passed an offset because it only performs a lookup in tzdb (via LocateZone) and doesn't try to parse the offset. That's despite documentation in other places saying that offsets are completely fine and that "+00:00" should be considered as identical to "UTC". Given all this, I would expect "+00:00" to be properly recognized as UTC, and "+01:00" to be recognized as a fixed offset.
Component(s)
Python
Describe the bug, including details regarding any error messages, version, and platform.
See pola-rs/polars#9586 (comment), where the issue was identified.
The Arrow format specification describes 3 timestamp formats: "naive date-time" (timezone string is null), "zoned date-time" (timezone string is from tzdb) and "offset date-time" (timezone string is a fixed RFC 3339 num-offset, so no Z). The doc for assume_timezone makes no specific mention of it, but it cannot handle being passed an offset because it only performs a lookup in tzdb (via
LocateZone) and doesn't try to parse the offset. That's despite documentation in other places saying that offsets are completely fine and that "+00:00" should be considered as identical to "UTC". Given all this, I would expect "+00:00" to be properly recognized as UTC, and "+01:00" to be recognized as a fixed offset.Component(s)
Python