-
Notifications
You must be signed in to change notification settings - Fork 4.1k
GH-47380: [Python] Apply maps_as_pydicts to Nested MapScalar Values #47454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
ffbc346
GH-47380: [Python] Apply maps_as_pydicts to Nested Map Values
jo-migo 60b963c
Simplify key iteration
jo-migo e542e37
Retain existing unit test and introduce new one
jo-migo 985c315
Handle empty values array in map struct, add corresponding test
jo-migo 45b5431
Merge branch 'apache:main' into fix-nested-maps-as-pydicts
jo-migo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the issue with the previous iterator?
I am unsure I understand what was wrong with the previous iteration and what are we fixing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the author of the MR, but the previous
for key, value in selfwould result in a call ofself.__iter__().That thing is defined above this function, and does yield
(k.as_py(), v.as_py())directly. So it's hardcoded to the defaultmaps_as_pydictsbehaviour, which is incompatible to what we want here. The values, if they happen to be map types, basically would be yielded in the non-dict way (as the list of (key, value) tuples).It's also not possible to adjust the
__iter__()function because by definition it has to have no parameters, so it has to be opinionated in some sense about how to handle maps.So in this case, we have to loop over the keys and values manually and then do the
as_py()call on the value type with the correctmaps_as_pydictsparameter ourselves.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that makes sense, thanks for the clarification, should we validate
self.valuesis notNoneas we currently do on__iter__?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that would make sense, also adding a test for whether an empty map works could make sense? @jo-migo ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I have added an explicit check for
self.values is Noneand corresponding test.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the context, check out the PR description for a (hopefully) straightforward minimal example, but it's just like @jonded94 says. The problem with the current implementation is that round trips from python -> arrow -> python are broken at the moment for nested map columns ---
basically we want
but the current behaviour with
maps_as_pydictsis: