Skip to content

PERF: restore hash table pre-allocation in value_count#65027

Open
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:regrs-7
Open

PERF: restore hash table pre-allocation in value_count#65027
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:regrs-7

Conversation

@jbrockmendel
Copy link
Copy Markdown
Member

@jbrockmendel jbrockmendel commented Apr 2, 2026

NB: motivated by an asv regression vs 3.0

Summary

  • Restore kh_resize(table, n) for non-object dtypes in value_count, reverting the inadvertent change to n // 10 from PERF: Eliminate redundant kh_get calls in hashtable operations #64543
  • The n // 10 pre-allocation causes expensive hash table resizes during insertion for float64 (whose hash function uses murmur2 + NaN handling), resulting in ~2x regression in Series.mode() for float dtype
  • Object dtype keeps n // 10 as it was in v3.0

Benchmark (float, N=100000)

median (ms)
n // 10 (regressed) 19.97
n (this PR) 10.66

Test plan

  • pytest pandas/tests/test_algos.py — 494 passed
  • pytest pandas/tests/base/test_value_counts.py pandas/tests/frame/methods/test_value_counts.py pandas/tests/series/methods/test_value_counts.py pandas/tests/frame/methods/test_duplicated.py pandas/tests/series/methods/test_duplicated.py — 301 passed

🤖 Generated with Claude Code

…bject dtypes

The n // 10 pre-allocation introduced in pandas-dev#64543 causes expensive hash table
resizes for float64 (and other non-object types), resulting in ~2x regression
in Series.mode() for float dtype. Restore the v3.0 pre-allocation of n for
non-object types while keeping n // 10 for object dtype.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant