bpo-29585: optimize site.py startup time#136
Conversation
Since 'PYTHONFRAMEWORK' is in sysconfigdata, I cannot stop importing them. |
4f58b0c to
483769e
Compare
There was a problem hiding this comment.
This is a major change: by duplicating code from sysconfig.py into site.py, there would now be a implicit link between the two and a potential maintenance problem. If we were to do this, there should at least be mention of these duplications in sysconfig.py and/or tests for the duplicated behavior. Also, there should be a b.p.o issue for this proposed change.
vstinner
left a comment
There was a problem hiding this comment.
I dislike this approach. I suggest to experiment a _site extension module: see the issue.
|
I've addes |
dfeced1 to
45f5e9a
Compare
45f5e9a to
07369f1
Compare
Skip importing sysconfig when possible. Median +- std dev: [default] 15.8 ms +- 0.0 ms -> [patched] 14.7 ms +- 0.0 ms: 1.07x faster (-7%)
75b4c6c to
ff0d05c
Compare
|
@ned-deily I created bpo issue and added comment about duplicated code. |
|
A quick & dirty benchmark using my perf module: I see an improvement of -0.8 ms (1.05x faster) on Python startup time. EDIT: this benchmark was run on my Linux laptop. |
|
Hum, did I miss something? sysconfig is still imported by the site module on macOS by getsitepackages(): macbook:master haypo$ ./python.exe -c 'import sys; print("sysconfig" in sys.modules)' I suggest this additionnal change: With this additionnal change, the speedup on macOS is quite significant: -13.4 ms (1.61x faster)! cc @1st1: Yury, since you use macOS, you probably want to see this change merged :-) |
|
|
||
|
|
||
| def _getuserbase(): | ||
| # Stripped version of sysconfig._getuserbase() |
There was a problem hiding this comment.
Can you please elaborate this comment? Please explain that the function was duplicated to speedup Python startup and avoid the sysconfig import in the site module. Add a reference to the bpo 29585.
It seems like you modified this change: you moved "if env_base: return env_base" at the top. Please modify also sysconfig._getuserbase(). I would also prefer the sysconfig also uses sys._framework, instead of get_config_var("PYTHONFRAMEWORK").
|
|
||
| def _get_path(userbase): | ||
| # stripped version of sysconfig.get_path('purelib', os.name + '_user') | ||
| version = sys.version_info[:2] |
|
|
||
|
|
||
| def _get_path(userbase): | ||
| # stripped version of sysconfig.get_path('purelib', os.name + '_user') |
There was a problem hiding this comment.
Again, please elaborate the comment. (Maybe point to _getuserbase() above?)
|
Ned Deily: "This is a major change: by duplicating code from sysconfig.py into site.py, there would now be a implicit link between the two and a potential maintenance problem. If we were to do this, there should at least be mention of these duplications in sysconfig.py and/or tests for the duplicated behavior." @methane added a comment in sysconfig.py, IMHO it's enough. About testing: @methane, can you try to write a test to check that site and sysconfig return the same value for the two private functions? You may have to tag the unit test with @cpython_only, since the two new site functions are private. Ned Deily: "Also, there should be a b.p.o issue for this proposed change." Done. Since most Ned's requests are done, I dismiss his review. |
|
Except of my minor commits, I now like the overall shape of the PR. I now agree that writing a new _site module is not necessary, it's better to keep all code in the site.py file. |
|
On macOS, performance gain is more impressive than Linux: |
|
Oh, I'm sorry. I missed your above comment. |
vstinner
left a comment
There was a problem hiding this comment.
LGTM, but I would like to see an answer to the question on tests (check that site functions return the same result than sysconfig?) before approving your change :-)
vstinner
left a comment
There was a problem hiding this comment.
Thanks! The latest change is much better than the first iteration, and it now LGTM!
|
Thank you for review. |
|
Why couldn't we make 'sysconfig' to import a couple of private functions from 'site', why copying? |
It's not a pure copy/paste, one function is specialized for site use case. I dislike the idea of importing site code from sysconfig. |
Thanks for this cool and cheap speedup! |
|
Ah, I've missed build failure on Windows. |
|
Oh, Windows build is broken :-( Need to fix PC/pyconfig.h? |
|
Oh, AppVeyor didn't catch the bug simply because it wasn't run on this PR! |
Python 3 has no xrange.
…_inline_except_opcode_array_c Fixes THREE boundary-domain bugs in build_inline_except_opcode_array_c (introduced W27c #2a 7135d94), found across two HIR-diff Phase 0 cycles of the W-2B-RECONVERT investigation. CONVENTION (Python/jit/bytecode.cpp:8-14, builder.cpp:1235, phx_frame_state.h cur_instr_offs semantics): - jit_bc_instr_init expects INSTRUCTION INDEX (codeUnit[]) - jit_bc_instr_get_jump_target / next_offset / base_offset return whatever was stored at init (now INDEX after Class A fix) - phx_block_map keys are BYTE OFFSETS - OpcodeArrayEntry.base_offset is consumed downstream as BYTE OFFSET (cur_instr_offs assignment per builder_emit_c.c:3320) - BCOffset.value() (caller-passed except_body_offset) is BYTE OFFSET - BYTES = INDEX * sizeof(_Py_CODEUNIT) (= 2 in 3.12) NAMED CONVERSIONS (Python/jit/bytecode_c.h, gated by _Py_OPCODE): static inline int phx_bc_offset_to_instr_index(int byte_off); static inline int phx_bc_instr_index_to_offset(int instr_idx); Codifies the boundary-domain rule by example (per pythia python#137 python#2 + supervisor 19:01:19Z + theologian 19:01:17Z). THREE FIXES: CLASS A (line 3241): jit_bc_instr_init was passed except_body_offset (BCOffset.value() byte offset) where INSTRUCTION INDEX was expected. codeUnit(code)[byte_offset] read PAST end of co_code → garbage opcode → switch-default → Deopt with corrupt frame state. Found by Phase 0 HIR-diff (test_exc_raise_catch bb 12: correct Return -1 vs corrupt LoadConst NoneType + Deopt at offset 58). CLASS B (line 3273-3275, theologian class-of-bug audit 18:42:53Z): target = jit_bc_instr_get_jump_target returns INDEX, but phx_block_map_lookup_or_panic expects BYTE OFFSET. Without conversion, JUMP_BACKWARD-in-except-body lookup fails: JIT_CHECK_C panic OR silent wrong-block. Dormant pre-fix because no test had backward-jump-in-except-body. CLASS C (line 3260, exposed by Phase 0' HIR-diff after Class A+B fix): After Class A fix corrected the init to INDEX, jit_bc_instr_base_offset returns INDEX. But entry->base_offset is consumed downstream (line 3320) as BYTE OFFSET via match_tc.frame.cur_instr_offs assignment. Pre-fix 'correct by accident' — Class A's BYTES-as-INDEX init wrote BYTES into bci->base_offset, so jit_bc_instr_base_offset returned BYTES, matching downstream. Correct Class A exposed Class C: cur_instr_offs got INDEX (half the correct BYTE value) → interpreter Deopt resumed at wrong bytecode position → SIGSEGV in test_multiple_exceptions_in_loop (deterministic 0/20 post Class A+B fix, vs 20/20 PASS pre-W27c). DIAGNOSIS: HIR-diff for test_multiple_exceptions_in_loop revealed Deopt CurInstrOffset 124 (correct, BYTES) → 62 (wrong, INDEX = 124/2). Direct evidence of the domain mismatch. LATENT in pushed W27c #2a (e4e7507 on SonicField/cpython): all three classes present. Class A, B dormant (no test exercises emitInlineExceptionMatch or JUMP_BACKWARD-in-except-body). Class C compensated by Class A — both broken in opposite directions canceling out for downstream consumers of entry->base_offset. ALL three must fix together. INVESTIGATION CHAIN: - testkeeper bisect 17:52:30Z localized #2b regression → W27c #2b sole - pythia python#136 python#1 18:23:34Z flagged HEAP/RACE rode on absence-of-evidence - generalist 18:24Z proposed HIR-diff Phase 0 falsifier - generalist 18:32+18:34Z captured HIR_2a + HIR_2b dumps; found Class A - theologian 18:42:53Z class-of-bug audit found Class B - supervisor 18:43:17Z directed dual-fix - pythia python#137 python#2 19:00:29Z flagged inline-arithmetic violates boundary-domain rule - supervisor 19:01:19Z + theologian 19:01:17Z directed amend to named conversions - testkeeper 19:06:55Z full Phoenix gate caught NEW regression (test_multiple_exceptions_in_loop deterministic 0/20) - generalist 19:14Z HIR-diff Phase 0' on test_multiple_exceptions_in_loop revealed Class C (cur_instr_offs 124→62) - supervisor 19:16:42Z authorized Class C fix + extended audit OTHER OpcodeArrayEntry FIELDS AUDITED (per supervisor 19:16:42Z extended class-of-bug discipline): - entry->opcode: written from jit_bc_instr_opcode (no domain — opcode value); consumed in dispatch loop switch. CLEAN. - entry->oparg: written from jit_bc_instr_oparg (no domain — oparg value); consumed in dispatch loop emit calls. CLEAN. - entry->base_offset: Class C above; FIXED. - entry->const_obj: written from PyTuple_GET_ITEM (PyObject*); consumed in dispatch loop hir_type_from_object. CLEAN (no domain conversion). - entry->jump_target_block: Class B above; FIXED. VERIFICATION pending (testkeeper 4-suite extended verify): 1. 30x test_exc_raise_catch (Class A regression) 2. 30x test_exc_binary_subscr_dict_in_try (Class A latent activation) 3. 30x test_exc_continue_in_loop (Class B latent activation) 4. 30x multi-except-in-loop sentinel (Class C latent activation) 5. Full Phoenix suite (which originally caught Class C)
Skip importing sysconfig when possible.
Median +- std dev: [default] 15.8 ms +- 0.0 ms -> [patched] 14.7 ms +- 0.0 ms: 1.07x faster (-7%)
(bpo-29585)