pyarrow: Cache the imported classes to avoid importing them each time#9439
pyarrow: Cache the imported classes to avoid importing them each time#9439alamb merged 1 commit intoapache:mainfrom
Conversation
alamb
left a comment
There was a problem hiding this comment.
Makes sense to me -- thank you @Tpt
@kylebarron / @adriangb as you have more experience with py03 could you also give this a brief look? Thank you
|
|
||
| fn array_class(py: Python<'_>) -> PyResult<&Bound<'_, PyType>> { | ||
| static TYPE: PyOnceLock<Py<PyType>> = PyOnceLock::new(); | ||
| TYPE.import(py, "pyarrow", "Array") |
There was a problem hiding this comment.
https://docs.rs/pyo3/0.28.2/pyo3/sync/struct.PyOnceLock.html#method.import
Looks like this is exactly the pattern it was designed for
There was a problem hiding this comment.
Codex also found similar uses in other well respected crates
- pydantic-core caches Python’s uuid.UUID type exactly this way:
https://github.com/pydantic/pydantic/blob/6178953d163d31004ac8834131cfcd6c2a84f7a8/pydantic-core/src/validators/uuid.rs#L28-L31 - polars-python caches imported Python types like numpy.ndarray and decimal.Decimal with PyOnceLock:
https://github.com/pola-rs/polars/blob/1bd62ebd25957ae97decc2d18b8a104acad5285d/crates/polars-python/src/conversion/any_value.rs#L544-L553
There was a problem hiding this comment.
Interesting that those crates cache the imports. I thought that Python cached the import anyways on the C side, so it was unnecessary to do it on the pyo3 side
There was a problem hiding this comment.
I thought that Python cached the import anyways on the C side, so it was unnecessary to do it on the pyo3 side
Yes, cpython does not reinitialize the module each time (I guess this what you mean by "cache the import", sorry for the dumb answer if it's not the case). However, doing py.import(my_module)?.getattr(my_class)? requires at least two map lookups, one to fetch the module object from its path and one to fetch the class from the module. The cache allows to skip these lookups and directly use the type object. If the GIL is enabled there is no synchronization cost to do that (PyOnceLock use the GIL as lock).
There was a problem hiding this comment.
I see; so it's just saving two lookups into the CPython HashMap? I guess that's not nothing, but it's not the slow module reinitialization I was worried it was.
There was a problem hiding this comment.
I'd be curious about a benchmark, but not required to merge
There was a problem hiding this comment.
Thank you! I just ran a benchmark by curiosity. Here is the result:
import_direct time: [272.33 ns 274.23 ns 276.52 ns]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe
import_intern time: [206.61 ns 207.60 ns 208.84 ns]
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
import_static time: [1.3524 ns 1.3578 ns 1.3648 ns]
Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) high mild
13 (13.00%) high severe
the three benchmarks import uuid.UUID.
import_directusesPython::import()?.getattr()import_internusesPython::import(intern!())?.getattr(intern!())to avoid always allocating the strings"uuid"and"UUID"import_staticusesPyOnceLock::import
Code:
Details
```rust use std::hint::black_box;use codspeed_criterion_compat::{criterion_group, criterion_main, Bencher, Criterion};
use pyo3::prelude::*;
use pyo3::intern;
use pyo3::sync::PyOnceLock;
use pyo3::types::PyType;
fn import_direct(b: &mut Bencher<'_>) {
Python::attach(|py| {
b.iter(|| black_box(black_box(&py.import("uuid").unwrap()).getattr("UUID")).unwrap());
});
}
fn import_intern(b: &mut Bencher<'_>) {
Python::attach(|py| {
b.iter(|| {
black_box(
black_box(&py.import(intern!(py, "uuid")).unwrap()).getattr(intern!(py, "UUID")),
)
.unwrap()
});
});
}
fn import_static(b: &mut Bencher<'_>) {
Python::attach(|py| {
static TYPE: PyOnceLock<Py> = PyOnceLock::new();
b.iter(|| {
black_box(TYPE.import(py, "uuid", "UUID")).unwrap();
});
});
}
fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("import_direct", import_direct);
c.bench_function("import_intern", import_intern);
c.bench_function("import_static", import_static);
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
</details>
|
Thanks again @Tpt and @kylebarron |
## Summary Was reading the arrow-rs changelog and I ran into [this](apache/arrow-rs#9439) PR, seems like `PyOnceLock` was built for this and they have some very promising benchmarks [there](https://github.com/apache/arrow-rs/pull/9439/changes#r2955818325). I've also wrapped all static strings that are passed into `pyo3` with the `intern` macro, which prevents allocating a new `PyString` on every call. --------- Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Which issue does this PR close?
Rationale for this change
Speed up conversion by only importing
pyarrowonce.What changes are included in this PR?
PyOnceLock::importto import the types..extract::<PyBackedStr>()?(theDisplayimplementation already does something similar)Are these changes tested?
Covered by existing tests. It would be nice to add benchmark but it might require to:
pyarrowfrom criterion tests (likely by runningpip/uvfrom the Rust benchmark code)Are there any user-facing changes?
No