GH-22232: [C++][Python] Introduce optional default_column_type parameter#47663
GH-22232: [C++][Python] Introduce optional default_column_type parameter#47663vladborovtsov wants to merge 12 commits intoapache:mainfrom
Conversation
|
|
|
|
4 similar comments
|
|
|
|
|
|
|
|
|
@github-actions crossbow submit preview-docs |
|
|
|
|
|
|
Thank you @vladborovtsov for the contribution. |
|
|
|
Hi @AlenkaF |
|
Happy to see a response! I will wait for an opinion from a C++ dev and in the meantime try to look at the Python part. |
|
Hi @AlenkaF |
|
There is not too much to add into I’ve added a guard test to ensure that my changes don’t override the type inference logic. Not really sure if it it makes sense to have them there. As for the reader tests, I’ve tried to keep them representative while minimizing the amount of lines with recent changes. |
raulcd
left a comment
There was a problem hiding this comment.
I am unsure about the approach as I am not an expert on the C++ side of the CSV reader. I'll investigate a little. In the meantime, could you fix the related lint failure and rebase main to fix some of the CI issues?
38e2df9 to
7476f6a
Compare
bcc377b to
bef2052
Compare
|
I think that's it! |
pitrou
left a comment
There was a problem hiding this comment.
This looks good to me, just two minor comments below.
…umn_type Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Rationale for this change
Add an optional default_column_type parameter to the CSV reading API (C++ and Python) to provide a fallback type when per-column types aren’t specified, improving schema consistency and complementing the existing column_types logic.
What changes are included in this PR?
Are these changes tested?
Yes. Existing and new tests are passing.
C++:
pyarrow:
New tests are passing.
Are there any user-facing changes?
I believe this change is backward compatible. Parameter is optional and its default value doesn't change the existing behavior; All the existing rests are passing.
Maybe relevant: #22232
Relates to #47502
GitHub Issue: [C++] CSV reader: add a default column type (or sentinel mapping) to avoid per-column enumeration #47502
GitHub Issue: [C++] CSV reader: Ability to not infer column types. #22232