Skip to content

[C++] CSV reader: Ability to not infer column types. #22232

@asfimport

Description

@asfimport

I'm trying to read CSV as is. All columns as strings. I don't know the schema of these CSVs and they will vary as they are provided by user.

Right now i'm using pandas.read_csv(dtype=str) which works great, but since final destination of these CSVs are parquet files it seems like much more efficient to use pyarrow.csv.read_csv in future, as soon as this becomes available :)

I tried things like pyarrow.csv.read_csv(convert_types=ConvertOptions(columns_types=defaultdict(lambda: 'string'))) but it doesn't work.

Maybe I just didnt' find something that already exists? :)

Environment: Ubuntu Xenial
Reporter: Bogdan Klichuk

Note: This issue was originally created as ARROW-5811. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions