Skip to content

bpo-29585: optimize site.py startup time#136

Merged
methane merged 9 commits intopython:masterfrom
methane:optimize-site-startup
Jun 28, 2017
Merged

bpo-29585: optimize site.py startup time#136
methane merged 9 commits intopython:masterfrom
methane:optimize-site-startup

Conversation

@methane
Copy link
Copy Markdown
Member

@methane methane commented Feb 16, 2017

Skip importing sysconfig when possible.

Median +- std dev: [default] 15.8 ms +- 0.0 ms -> [patched] 14.7 ms +- 0.0 ms: 1.07x faster (-7%)

(bpo-29585)

@methane
Copy link
Copy Markdown
Member Author

methane commented Feb 16, 2017

$ cp  build/lib.linux-x86_64-3.7/_sysconfigdata_m_linux_x86_64-linux-gnu.py sysconfigdata
$ ./python -c 'import sysconfigdata'  # create pyc file
$ ./python -m timeit -s 'import sysconfigdata, importlib' -- 'importlib.reload(sysconfigdata)'
1000 loops, best of 5: 269 usec per loop

Since 'PYTHONFRAMEWORK' is in sysconfigdata, I cannot stop importing them.
But other cases, I can skip importing sysconfig and sysconfigdata completely.

@methane methane force-pushed the optimize-site-startup branch from 4f58b0c to 483769e Compare February 16, 2017 14:35
Copy link
Copy Markdown
Member

@ned-deily ned-deily left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a major change: by duplicating code from sysconfig.py into site.py, there would now be a implicit link between the two and a potential maintenance problem. If we were to do this, there should at least be mention of these duplications in sysconfig.py and/or tests for the duplicated behavior. Also, there should be a b.p.o issue for this proposed change.

@methane methane changed the title optimize site.py [RFC] bpo-29585: optimize site.py startup time Feb 17, 2017
Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike this approach. I suggest to experiment a _site extension module: see the issue.

@methane
Copy link
Copy Markdown
Member Author

methane commented Feb 18, 2017

I've addes sys._framework and remove sysconf dependency completely.

@methane methane force-pushed the optimize-site-startup branch from dfeced1 to 45f5e9a Compare February 18, 2017 05:04
@methane methane changed the title [RFC] bpo-29585: optimize site.py startup time [RFC] [bpo-29585](https://bugs.python.org/issue29585): optimize site.py startup time Feb 18, 2017
@methane methane changed the title [RFC] [bpo-29585](https://bugs.python.org/issue29585): optimize site.py startup time [RFC] bpo-29585: optimize site.py startup time Feb 18, 2017
@methane methane force-pushed the optimize-site-startup branch from 45f5e9a to 07369f1 Compare February 18, 2017 05:10
@methane methane changed the title [RFC] bpo-29585: optimize site.py startup time bpo-29585: optimize site.py startup time Feb 19, 2017
methane added 3 commits June 28, 2017 19:28
Skip importing sysconfig when possible.

Median +- std dev: [default] 15.8 ms +- 0.0 ms -> [patched] 14.7 ms +- 0.0 ms: 1.07x faster (-7%)
@methane methane force-pushed the optimize-site-startup branch from 75b4c6c to ff0d05c Compare June 28, 2017 10:29
@vstinner vstinner added the performance Performance or resource usage label Jun 28, 2017
@methane
Copy link
Copy Markdown
Member Author

methane commented Jun 28, 2017

@ned-deily I created bpo issue and added comment about duplicated code.

@methane
Copy link
Copy Markdown
Member Author

methane commented Jun 28, 2017

@Haypo sysconfig (and _sysconfigdata_...) module is relatively large and
most applications (except packaging tools) doesn't use it at all.
Now this pull request is focusing it.

Slow code path is skipped by #167 although abspath is still slow.

@vstinner
Copy link
Copy Markdown
Member

vstinner commented Jun 28, 2017

A quick & dirty benchmark using my perf module: I see an improvement of -0.8 ms (1.05x faster) on Python startup time.

haypo@selma$ ./python -m perf command --inherit=PYTHONPATH -v -o pr136.json -- ./python -c pass  
...
haypo@selma$ ./python -m perf command --inherit=PYTHONPATH -v -o ref.json -- ./python -c pass
...
haypo@selma$ ./python -m perf compare_to ref.json pr136.json 
Mean +- std dev: [ref] 17.4 ms +- 0.8 ms -> [pr136] 16.6 ms +- 1.1 ms: 1.05x faster (-5%)

EDIT: this benchmark was run on my Linux laptop.

@vstinner
Copy link
Copy Markdown
Member

Hum, did I miss something? sysconfig is still imported by the site module on macOS by getsitepackages():

        if sys.platform == "darwin":
            # for framework builds *only* we add the standard Apple
            # locations.
            from sysconfig import get_config_var
            framework = get_config_var("PYTHONFRAMEWORK")
            if framework:
                sitepackages.append(
                        os.path.join("/Library", framework,
                            '%d.%d' % sys.version_info[:2], "site-packages"))

macbook:master haypo$ ./python.exe -c 'import sys; print("sysconfig" in sys.modules)'
True

I suggest this additionnal change:

diff --git a/Lib/site.py b/Lib/site.py
index 500c59b..929252d 100644
--- a/Lib/site.py
+++ b/Lib/site.py
@@ -334,15 +334,12 @@ def getsitepackages(prefixes=None):
         else:
             sitepackages.append(prefix)
             sitepackages.append(os.path.join(prefix, "lib", "site-packages"))
-        if sys.platform == "darwin":
-            # for framework builds *only* we add the standard Apple
-            # locations.
-            from sysconfig import get_config_var
-            framework = get_config_var("PYTHONFRAMEWORK")
-            if framework:
-                sitepackages.append(
-                        os.path.join("/Library", framework,
-                            '%d.%d' % sys.version_info[:2], "site-packages"))
+        # for framework builds *only* we add the standard Apple
+        # locations.
+        if sys.platform == "darwin" and sys._framework:
+            sitepackages.append(
+                    os.path.join("/Library", sys._framework,
+                        '%d.%d' % sys.version_info[:2], "site-packages"))
     return sitepackages
 
 def addsitepackages(known_paths, prefixes=None):

With this additionnal change, the speedup on macOS is quite significant: -13.4 ms (1.61x faster)!

macbook:master haypo$ ./python.exe -m perf command --inherit=PYTHONPATH -v -o ref.json -- ./python.exe -c pass 
...
macbook:master haypo$ ./python.exe -m perf command --inherit=PYTHONPATH -v -o pr136.json -- ./python.exe -c pass
...
macbook:master haypo$ ./python.exe -m perf compare_to ref.json pr136.json 
Mean +- std dev: [ref] 35.4 ms +- 1.7 ms -> [pr136] 22.0 ms +- 1.9 ms: 1.61x faster (-38%)

cc @1st1: Yury, since you use macOS, you probably want to see this change merged :-)

Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New review.

Comment thread Lib/site.py Outdated


def _getuserbase():
# Stripped version of sysconfig._getuserbase()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate this comment? Please explain that the function was duplicated to speedup Python startup and avoid the sysconfig import in the site module. Add a reference to the bpo 29585.

It seems like you modified this change: you moved "if env_base: return env_base" at the top. Please modify also sysconfig._getuserbase(). I would also prefer the sysconfig also uses sys._framework, instead of get_config_var("PYTHONFRAMEWORK").

Comment thread Lib/site.py Outdated

def _get_path(userbase):
# stripped version of sysconfig.get_path('purelib', os.name + '_user')
version = sys.version_info[:2]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is [:2] really needed?

Comment thread Lib/site.py Outdated


def _get_path(userbase):
# stripped version of sysconfig.get_path('purelib', os.name + '_user')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please elaborate the comment. (Maybe point to _getuserbase() above?)

@vstinner
Copy link
Copy Markdown
Member

Ned Deily: "This is a major change: by duplicating code from sysconfig.py into site.py, there would now be a implicit link between the two and a potential maintenance problem. If we were to do this, there should at least be mention of these duplications in sysconfig.py and/or tests for the duplicated behavior."

@methane added a comment in sysconfig.py, IMHO it's enough.

About testing: @methane, can you try to write a test to check that site and sysconfig return the same value for the two private functions? You may have to tag the unit test with @cpython_only, since the two new site functions are private.

Ned Deily: "Also, there should be a b.p.o issue for this proposed change."

Done.

Since most Ned's requests are done, I dismiss his review.

@vstinner vstinner dismissed ned-deily’s stale review June 28, 2017 12:19

@methade modified this PR.

@vstinner
Copy link
Copy Markdown
Member

Except of my minor commits, I now like the overall shape of the PR. I now agree that writing a new _site module is not necessary, it's better to keep all code in the site.py file.

@methane
Copy link
Copy Markdown
Member Author

methane commented Jun 28, 2017

On macOS, performance gain is more impressive than Linux:

$ ./python.exe -m perf compare_to ref.json pr136.json
Mean +- std dev: [ref] 30.4 ms +- 0.9 ms -> [pr136] 21.6 ms +- 4.0 ms: 1.40x faster (-29%)

@methane
Copy link
Copy Markdown
Member Author

methane commented Jun 28, 2017

Oh, I'm sorry. I missed your above comment.

Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I would like to see an answer to the question on tests (check that site functions return the same result than sysconfig?) before approving your change :-)

Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The latest change is much better than the first iteration, and it now LGTM!

@methane methane merged commit a8f8d5b into python:master Jun 28, 2017
@methane methane deleted the optimize-site-startup branch June 28, 2017 15:31
@methane
Copy link
Copy Markdown
Member Author

methane commented Jun 28, 2017

Thank you for review.

@1st1
Copy link
Copy Markdown
Member

1st1 commented Jun 28, 2017

Why couldn't we make 'sysconfig' to import a couple of private functions from 'site', why copying?

@vstinner
Copy link
Copy Markdown
Member

Why couldn't we make 'sysconfig' to import a couple of private functions from 'site', why copying?

It's not a pure copy/paste, one function is specialized for site use case. I dislike the idea of importing site code from sysconfig.

@vstinner
Copy link
Copy Markdown
Member

Thank you for review.

Thanks for this cool and cheap speedup!

@methane
Copy link
Copy Markdown
Member Author

methane commented Jun 28, 2017

Ah, I've missed build failure on Windows.

@vstinner
Copy link
Copy Markdown
Member

Oh, Windows build is broken :-(

     2>..\Python\sysmodule.c(1968): error C2065: 'PYTHONFRAMEWORK': undeclared identifier [C:\buildbot.python.org\3.x.kloth-win64\build\PCbuild\pythoncore.vcxproj]

Need to fix PC/pyconfig.h?

@vstinner
Copy link
Copy Markdown
Member

Oh, AppVeyor didn't catch the bug simply because it wasn't run on this PR!

@vstinner
Copy link
Copy Markdown
Member

@methane proposed PR #2476 to fix PC/pyconfig.h, I proposed PR #2477.

vstinner added a commit that referenced this pull request Jun 28, 2017
akruis pushed a commit to akruis/cpython that referenced this pull request Oct 10, 2017
akruis pushed a commit to akruis/cpython that referenced this pull request Oct 29, 2017
SonicField added a commit to SonicField/cpython that referenced this pull request Apr 25, 2026
…_inline_except_opcode_array_c

Fixes THREE boundary-domain bugs in build_inline_except_opcode_array_c
(introduced W27c #2a 7135d94), found across two HIR-diff Phase 0 cycles
of the W-2B-RECONVERT investigation.

CONVENTION (Python/jit/bytecode.cpp:8-14, builder.cpp:1235,
phx_frame_state.h cur_instr_offs semantics):
- jit_bc_instr_init expects INSTRUCTION INDEX (codeUnit[])
- jit_bc_instr_get_jump_target / next_offset / base_offset return whatever
  was stored at init (now INDEX after Class A fix)
- phx_block_map keys are BYTE OFFSETS
- OpcodeArrayEntry.base_offset is consumed downstream as BYTE OFFSET
  (cur_instr_offs assignment per builder_emit_c.c:3320)
- BCOffset.value() (caller-passed except_body_offset) is BYTE OFFSET
- BYTES = INDEX * sizeof(_Py_CODEUNIT) (= 2 in 3.12)

NAMED CONVERSIONS (Python/jit/bytecode_c.h, gated by _Py_OPCODE):
  static inline int phx_bc_offset_to_instr_index(int byte_off);
  static inline int phx_bc_instr_index_to_offset(int instr_idx);

Codifies the boundary-domain rule by example (per pythia python#137 python#2 +
supervisor 19:01:19Z + theologian 19:01:17Z).

THREE FIXES:

CLASS A (line 3241): jit_bc_instr_init was passed except_body_offset
(BCOffset.value() byte offset) where INSTRUCTION INDEX was expected.
codeUnit(code)[byte_offset] read PAST end of co_code → garbage opcode →
switch-default → Deopt with corrupt frame state. Found by Phase 0
HIR-diff (test_exc_raise_catch bb 12: correct Return -1 vs corrupt
LoadConst NoneType + Deopt at offset 58).

CLASS B (line 3273-3275, theologian class-of-bug audit 18:42:53Z):
target = jit_bc_instr_get_jump_target returns INDEX, but
phx_block_map_lookup_or_panic expects BYTE OFFSET. Without conversion,
JUMP_BACKWARD-in-except-body lookup fails: JIT_CHECK_C panic OR silent
wrong-block. Dormant pre-fix because no test had backward-jump-in-except-body.

CLASS C (line 3260, exposed by Phase 0' HIR-diff after Class A+B fix):
After Class A fix corrected the init to INDEX, jit_bc_instr_base_offset
returns INDEX. But entry->base_offset is consumed downstream (line 3320)
as BYTE OFFSET via match_tc.frame.cur_instr_offs assignment. Pre-fix
'correct by accident' — Class A's BYTES-as-INDEX init wrote BYTES into
bci->base_offset, so jit_bc_instr_base_offset returned BYTES, matching
downstream. Correct Class A exposed Class C: cur_instr_offs got INDEX
(half the correct BYTE value) → interpreter Deopt resumed at wrong
bytecode position → SIGSEGV in test_multiple_exceptions_in_loop
(deterministic 0/20 post Class A+B fix, vs 20/20 PASS pre-W27c).

DIAGNOSIS:
HIR-diff for test_multiple_exceptions_in_loop revealed Deopt CurInstrOffset
124 (correct, BYTES) → 62 (wrong, INDEX = 124/2). Direct evidence of the
domain mismatch.

LATENT in pushed W27c #2a (e4e7507 on SonicField/cpython): all three
classes present. Class A, B dormant (no test exercises emitInlineExceptionMatch
or JUMP_BACKWARD-in-except-body). Class C compensated by Class A — both
broken in opposite directions canceling out for downstream consumers of
entry->base_offset. ALL three must fix together.

INVESTIGATION CHAIN:
- testkeeper bisect 17:52:30Z localized #2b regression → W27c #2b sole
- pythia python#136 python#1 18:23:34Z flagged HEAP/RACE rode on absence-of-evidence
- generalist 18:24Z proposed HIR-diff Phase 0 falsifier
- generalist 18:32+18:34Z captured HIR_2a + HIR_2b dumps; found Class A
- theologian 18:42:53Z class-of-bug audit found Class B
- supervisor 18:43:17Z directed dual-fix
- pythia python#137 python#2 19:00:29Z flagged inline-arithmetic violates boundary-domain rule
- supervisor 19:01:19Z + theologian 19:01:17Z directed amend to named conversions
- testkeeper 19:06:55Z full Phoenix gate caught NEW regression
  (test_multiple_exceptions_in_loop deterministic 0/20)
- generalist 19:14Z HIR-diff Phase 0' on test_multiple_exceptions_in_loop
  revealed Class C (cur_instr_offs 124→62)
- supervisor 19:16:42Z authorized Class C fix + extended audit

OTHER OpcodeArrayEntry FIELDS AUDITED (per supervisor 19:16:42Z extended
class-of-bug discipline):
- entry->opcode: written from jit_bc_instr_opcode (no domain — opcode value);
  consumed in dispatch loop switch. CLEAN.
- entry->oparg: written from jit_bc_instr_oparg (no domain — oparg value);
  consumed in dispatch loop emit calls. CLEAN.
- entry->base_offset: Class C above; FIXED.
- entry->const_obj: written from PyTuple_GET_ITEM (PyObject*); consumed in
  dispatch loop hir_type_from_object. CLEAN (no domain conversion).
- entry->jump_target_block: Class B above; FIXED.

VERIFICATION pending (testkeeper 4-suite extended verify):
1. 30x test_exc_raise_catch (Class A regression)
2. 30x test_exc_binary_subscr_dict_in_try (Class A latent activation)
3. 30x test_exc_continue_in_loop (Class B latent activation)
4. 30x multi-except-in-loop sentinel (Class C latent activation)
5. Full Phoenix suite (which originally caught Class C)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance or resource usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants