Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions python/pyarrow/scalar.pxi
Original file line number Diff line number Diff line change
Expand Up @@ -1173,7 +1173,10 @@ cdef class MapScalar(ListScalar, Mapping):
if not maps_as_pydicts:
return list(self)
result_dict = {}
for key, value in self:
if self.values is None:
return result_dict

for key, value in zip(self.keys(), self.values.field(self.type.item_field.name)):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the issue with the previous iterator?
I am unsure I understand what was wrong with the previous iteration and what are we fixing here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the author of the MR, but the previous for key, value in self would result in a call of self.__iter__().

That thing is defined above this function, and does yield (k.as_py(), v.as_py()) directly. So it's hardcoded to the default maps_as_pydicts behaviour, which is incompatible to what we want here. The values, if they happen to be map types, basically would be yielded in the non-dict way (as the list of (key, value) tuples).

It's also not possible to adjust the __iter__() function because by definition it has to have no parameters, so it has to be opinionated in some sense about how to handle maps.

So in this case, we have to loop over the keys and values manually and then do the as_py() call on the value type with the correct maps_as_pydicts parameter ourselves.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense, thanks for the clarification, should we validate self.values is not None as we currently do on __iter__?

        arr = self.values
        if arr is None:
            return
        for k, v in zip(arr.field(self.type.key_field.name), arr.field(self.type.item_field.name)):
            yield (k.as_py(), v.as_py())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that would make sense, also adding a test for whether an empty map works could make sense? @jo-migo ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have added an explicit check for self.values is None and corresponding test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the context, check out the PR description for a (hopefully) straightforward minimal example, but it's just like @jonded94 says. The problem with the current implementation is that round trips from python -> arrow -> python are broken at the moment for nested map columns ---

basically we want

{'x': {'a': {'1': 1}}} -> {'x': {'a': {'1': 1}}}

but the current behaviour with maps_as_pydicts is:

{'x': {'a': {'1': 1}}} -> {'x': {'a': [('1', 1)]}}

if key in result_dict:
if maps_as_pydicts == "strict":
raise KeyError(
Expand All @@ -1183,7 +1186,7 @@ cdef class MapScalar(ListScalar, Mapping):
else:
warnings.warn(
f"Encountered key '{key}' which was already encountered.")
result_dict[key] = value
result_dict[key] = value.as_py(maps_as_pydicts=maps_as_pydicts)
return result_dict

def keys(self):
Expand Down
31 changes: 30 additions & 1 deletion python/pyarrow/tests/test_scalars.py
Original file line number Diff line number Diff line change
Expand Up @@ -956,7 +956,7 @@ def test_map_scalar_as_py_with_custom_field_name():
).as_py() == [("foo", "bar")]


def test_nested_map_types_with_maps_as_pydicts():
def test_map_types_with_maps_as_pydicts():
ty = pa.struct([
pa.field('x', pa.map_(pa.string(), pa.int8())),
pa.field('y', pa.list_(pa.map_(pa.string(), pa.int8()))),
Expand All @@ -966,3 +966,32 @@ def test_nested_map_types_with_maps_as_pydicts():
s = pa.scalar(v, type=ty)

assert s.as_py(maps_as_pydicts="strict") == v


def test_nested_map_types_with_maps_as_pydicts():
ty = pa.struct(
[
pa.field('x', pa.map_(pa.string(), pa.map_(pa.string(), pa.int8()))),
pa.field(
'y', pa.list_(pa.map_(pa.string(), pa.map_(pa.string(), pa.int8())))
),
]
)

v = {'x': {'a': {'1': 1}}, 'y': [{'b': {'2': 2}}, {'c': {'3': 3}}]}
s = pa.scalar(v, type=ty)

assert s.as_py(maps_as_pydicts="strict") == v


def test_map_scalar_with_empty_values():
map_type = pa.struct(
[
pa.field('x', pa.map_(pa.string(), pa.string())),
]
)

v = {'x': {}}
s = pa.scalar(v, type=map_type)

assert s.as_py(maps_as_pydicts="strict") == v
Loading