PERF: Optimize CSV categorical parsing when categories are known by jbrockmendel · Pull Request #65018 · pandas-dev/pandas

jbrockmendel · 2026-04-02T15:57:58Z

As I mentioned in #17743, I'm on the fence as to whether this is actually worth doing.

Summary

When read_csv receives a CategoricalDtype with pre-specified categories, map parsed values directly to category codes in a single pass using a pre-built hash table, skipping the factorize + recode_for_categories steps.
For non-string category types (datetime, float edge cases, bool), the optimization is attempted first via str() conversion and falls back gracefully to the existing _from_inferred_categories path if the string representations don't match the raw CSV tokens.
Adds ASV benchmark time_convert_known_categories to ReadCSVCategorical.

Test plan

Existing test_categorical.py tests pass (113 passed, 7 xfailed)
String, integer, float, datetime, timedelta, and boolean category types all produce correct results
Unexpected categories still emit Pandas4Warning and map to NA
Non-string types that fail string matching fall back correctly

🤖 Generated with Claude Code

When read_csv receives a CategoricalDtype with pre-specified categories, map parsed values directly to category codes in a single pass using a pre-built hash table, avoiding the factorize-then-recode steps. For non-string category types (datetime, float edge cases, bool), the optimization is attempted first and falls back to the existing path if str() representations don't match the raw CSV tokens. closes pandas-dev#17743 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jbrockmendel added the Performance Memory or execution speed performance label Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PERF: Optimize CSV categorical parsing when categories are known#65018

PERF: Optimize CSV categorical parsing when categories are known#65018
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:perf-17743

jbrockmendel commented Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jbrockmendel commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jbrockmendel commented Apr 2, 2026 •

edited

Loading