Skip to content

[C++][Python] Allow specifying default column type for CSV columns #47897

@cottrell

Description

@cottrell

Problem

When converting CSV data today, pyarrow users must either enumerate column names or construct a schema to force all fields to a given type. This makes simple workflows like "read everything as string" clumsy when the schema is not known ahead of time.

Proposed change

Expose a single default column type on arrow::csv::ConvertOptions and plumb it through the bindings so callers can say ConvertOptions(column_type=pa.string()). The option should apply to any columns not listed explicitly in column_types, including columns added via include_missing_columns.

Implementation status

A local branch adds ConvertOptions::column_type, wires it through the C++ reader, exposes it in pyarrow.csv, updates the docs, and adds unit tests covering the new behavior.

Component(s)

C++, Python

Next steps

Raise a PR with the implementation and tests once this ticket is accepted.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions