Conversation
|
👋 Thanks for assigning @TheBlueMatt as a reviewer! |
|
🔔 1st Reminder Hey @tnull! This PR has been waiting for your review. |
29f47f3 to
264aa7f
Compare
|
🔔 2nd Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 3rd Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 4th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 5th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 6th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 7th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 8th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 9th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 10th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 11th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 12th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 13th Reminder Hey @tnull! This PR has been waiting for your review. |
264aa7f to
493dd9a
Compare
|
🔔 14th Reminder Hey @tnull! This PR has been waiting for your review. |
a30cbfb to
1e7bdbc
Compare
|
🔔 15th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 16th Reminder Hey @tnull! This PR has been waiting for your review. |
There was a problem hiding this comment.
Thanks for looking into this and excuse the delay here!
I did a first pass, generally this looks already pretty good, but it is a huge PR and in some areas could be simplified. For instance, we should drop the generic Retry logic as the concrete implementations would already implement that if they need it. Likewise, we shouldn't fallback to the backup for now as it's really only meant as disaster recovery (KVStore caching is out-of-scope for this PR now, even though we might want to explore that soon, too). There is also no need to replicate the write-ordering locks in TierStore, but we'll need them in ForeignKVStoreAdapter (which might be mergeable with DynStore?).
Generally, if you find opportunities to reduce the size of the changeset here it would be appreciated. It would also be cool if you could try to break up the PR into more feature commits, but feel free to leave as is if not.
src/io/tier_store.rs
Outdated
|
|
||
| /// Configuration for exponential backoff retry behavior. | ||
| #[derive(Debug, Copy, Clone)] | ||
| pub struct RetryConfig { |
There was a problem hiding this comment.
Retrying is pretty specific to the particular KVStore implementation. I don't think we should expose retrying params on top of what we already do for implementations internally.
There was a problem hiding this comment.
Sure thing. This has been dropped.
I considered it because users can choose their own KVStore implementation without considering retrying. I thought to add some level of control but the added complexity and size of the changeset might not be worth it.
src/io/tier_store.rs
Outdated
| } | ||
|
|
||
| pub struct TierStoreInner { | ||
| /// For remote data. |
bindings/ldk_node.udl
Outdated
| [Throws=BuildError] | ||
| Node build_with_vss_store_and_header_provider(NodeEntropy node_entropy, string vss_url, string store_id, VssHeaderProvider header_provider); | ||
| [Throws=BuildError] | ||
| Node build_with_tier_store(NodeEntropy node_entropy, DynStore primary_store); |
There was a problem hiding this comment.
Can we now also expose Builder::build_with_store?
There was a problem hiding this comment.
Yes we can.
This is exposed here 451a392
src/ffi/types.rs
Outdated
| Self { inner: Arc::new(adapter) } | ||
| } | ||
|
|
||
| pub fn from_ldk_store(store: Arc<dyn LdkSyncAndAsyncKVStore + Send + Sync>) -> Arc<Self> { |
There was a problem hiding this comment.
If we rather make this an impl From<Arc<dyn LdkSyncAndAsyncKVStore + Send + Sync>> for Arc .., we can drop the wrap_storemacro and just use.into()` instead.
src/builder.rs
Outdated
| } | ||
|
|
||
| let store = wrap_store!(Arc::new(tier_store)); | ||
| self.build_with_store(node_entropy, store) |
There was a problem hiding this comment.
I think we need to use build_with_store_internal here to make sure we're using the same Runtime etc.
src/io/tier_store.rs
Outdated
| Arc::clone(&outer_lock.entry(locking_key).or_default()) | ||
| } | ||
|
|
||
| async fn execute_locked_write< |
There was a problem hiding this comment.
I'm confused, why are we replicating all the write-ordering logic here? Given the implementations are required to fullfill LDK's requirements already, can't we just call the inner write and be done with it?
There was a problem hiding this comment.
Again, the reasoning here is that users might not implement the KVStore trait for their own stores as required and thus, the onus of correctness will fall to us. If we can't get it from their inner stores, the wrapper TierStore should provide some guarantees. I do agree that if the inner store already satisfies LDK's requirements, the wrapper shouldn't need to duplicate that logic. The write-ordering has been removed.
src/ffi/types.rs
Outdated
| ) -> Pin<Box<dyn Future<Output = Result<(), lightning::io::Error>> + Send>> { | ||
| let inner = self.inner.clone(); | ||
|
|
||
| let primary_namespace = primary_namespace.to_string(); |
There was a problem hiding this comment.
I think here (and in remove) we need to add the write-ordering logic to ensure we follow LDK's KVStore::write requirements.
src/types.rs
Outdated
| /// A type alias for [`SyncAndAsyncKVStore`] with `Sync`/`Send` markers; | ||
| pub type DynStore = dyn SyncAndAsyncKVStore + Sync + Send; | ||
| #[cfg(feature = "uniffi")] | ||
| pub(crate) use crate::DynStore; |
There was a problem hiding this comment.
This is odd. Why do we need this?
There was a problem hiding this comment.
I had initially used the same name and the feature gating became necessary to differentiate the types at call sites when uniffi was enabled.
src/io/test_utils.rs
Outdated
| } | ||
| } | ||
|
|
||
| pub struct DelayedStore { |
There was a problem hiding this comment.
Not quite sure what coverage we gain with DelayedStore? IMO, we might be better off dropping all this boilerplate.
There was a problem hiding this comment.
This has been removed here cb66b59.
I couldn't think of any way better to test backup queue overflow not impacting primary writes. I agree that the boilerplate seems too much for just that single case.
src/ffi/types.rs
Outdated
| ) -> Result<Vec<String>, IOError>; | ||
| } | ||
|
|
||
| pub struct ForeignKVStoreAdapter { |
There was a problem hiding this comment.
Hmm, any reason this needs to be separate from DynStore? Couldn't we merge them?
There was a problem hiding this comment.
At the time I believe I introduced the extra wrapper due to the limitation of foreign trait implementation on foreign types, i.e. I couldn't implement KVStore for Arc<dyn LdkSyncAndAsyncKVStore> but with the new changes, this is no longer relevant and has been removed.
src/ffi/types.rs
Outdated
| } | ||
| } | ||
|
|
||
| #[async_trait] |
There was a problem hiding this comment.
Probably not worth taking a dependency just to desugar impl Future<Output = Result<(), IOError>> + Send + Sync + 'a.
There was a problem hiding this comment.
Probably not worth taking a dependency just to desugar
impl Future<Output = Result<(), IOError>> + Send + Sync + 'a.
Well, it's 'the official'/supported way to do async traits with Uniffi: https://mozilla.github.io/uniffi-rs/latest/futures.html#exporting-async-trait-methods
There was a problem hiding this comment.
Does uniffi require that in some way? Its just a comically-overkill way to sugar impl Future which is kinda nuts...
There was a problem hiding this comment.
Oh, worse, it looks like async_trait desugars to the Pin<Box<dyn ...>> version...which is dumb but at least for uniffi it shouldn't matter cause we'd have to do that anyway.
There was a problem hiding this comment.
Unfortunately this will need a substantial rebase now that #696 landed, sorry for that!
Besides some tedious rebase work, we now have DynStoreWrapper which can be used on the ffi and the non-ffi side. I do wonder if we can actually just merge that wrapper with the TierStore, as well as the ForeignKVStoreAdapter/DynStore. Seems all these wrapper structs may not be needed and we could just get away with one trait and one wrapper struct, mostly?
cba29a3 to
db1fe83
Compare
|
Thanks @tnull and @TheBlueMatt for the initial round of reviews. I've completed the rebase and integrated changes from #696. Given the scope of the restructuring, I recommend reviewing the new commit series afresh rather than diffing against the previous revision, as line numbers have shifted substantially. Summary of major changes:
For each prior review comment, I've replied inline noting whether it was addressed (with the relevant commit) or removed in the restructured series. |
There was a problem hiding this comment.
Thanks for the rebase! Feel free to squash the fixups.
Here a a few initial comments, but this is still a huge PR. To make it a bit more manageable, I do wonder if we can split out the FFI-related changes to a follow-up and here only add the tiered store itself? We probably would make sure both land for the next release, but it might be a bit easier to review these steps separately.
src/ffi/types.rs
Outdated
| }; | ||
|
|
||
| #[derive(Debug)] | ||
| pub enum IOError { |
There was a problem hiding this comment.
This is a lot of code added here, let's move it to a new ffi/io.rs module.
src/io/tier_store.rs
Outdated
| fn read( | ||
| &self, primary_namespace: &str, secondary_namespace: &str, key: &str, | ||
| ) -> io::Result<Vec<u8>> { | ||
| self.runtime.block_on(self.inner.read_internal( |
There was a problem hiding this comment.
I think this should call through to the KVStoreSync implementation of inner (which in some cases uses a different runtime), not use the same runtime.
I will split out the FFI-related changes into a separate PR. |
TheBlueMatt
left a comment
There was a problem hiding this comment.
Just looked at tier_store.rs, I'll let @tnull handle the rest.
src/io/tier_store.rs
Outdated
| /// | ||
| /// The backup operates on a best-effort basis: | ||
| /// - Writes are queued asynchronously (non-blocking) | ||
| /// - No retry logic (We assume local store is unlikely to have transient failures). |
There was a problem hiding this comment.
Hmm, this is maybe something we should consider dropping for now.
If we're imagining mostly a world where primary persistence is remote but local persistence is backup, there shouldn't be any real penalty to just requiring both writes to complete, and it makes the guarantees we can provide way better.
That said, some may want to use it in the reverse - a primary local store and a remote secondary store. In that case, we indeed don't want to block primary persistence, but we also probably can't really tolerate having no ability to recover from secondary failures.
Luckily, this would also simplify the code a decent bit here and we can come back and revisit this with a more cohesive answer for the second case. Just make sure we do things in parallel with spawns instead of waiting on both writes individually.
| fn is_ephemeral_cached_key(pn: &str, sn: &str, key: &str) -> bool { | ||
| matches!( | ||
| (pn, sn, key), | ||
| (NETWORK_GRAPH_PERSISTENCE_PRIMARY_NAMESPACE, _, NETWORK_GRAPH_PERSISTENCE_KEY) |
There was a problem hiding this comment.
There's also _SECONDARY_NAMESPACEs for these constants, though they're all empty anyway.
| matches!( | ||
| (pn, sn, key), | ||
| (NETWORK_GRAPH_PERSISTENCE_PRIMARY_NAMESPACE, _, NETWORK_GRAPH_PERSISTENCE_KEY) | ||
| | (SCORER_PERSISTENCE_PRIMARY_NAMESPACE, _, SCORER_PERSISTENCE_KEY) |
There was a problem hiding this comment.
Hmm, if we're using a CombinedScorer the scorer is actually very small and only local data. In a multi-device setup, it might well make sense to sync this. I don't feel strongly about this, though, just raising it as an option. @tnull what do you think?
There was a problem hiding this comment.
Hmm, I guess it's mostly a matter of latency, as every time the scorer is changed event handling would be delayed by the remote store RTT currently, AFAIU. The question is whether we expect that people who switch devices frequently really would benefit from recent scoring data. Does that happen very often? Not sure?
There was a problem hiding this comment.
Yea, probably not. In theory the background processor can run scorer persistence in parallel with other tasks, though it largely doesn't today (but maybe will soon, with Joost's work).
db1fe83 to
35f9ec2
Compare
|
@enigbe Do you still intend to move forward here? LDK Node v0.8 beta might only be a few weeks away, would be great to get this landed in time! |
35f9ec2 to
14dbb22
Compare
|
Apologies for the delay here. I'll be addressing it today. |
Please note that by now we'll want to dedup/DRY up the code here with |
I am currently reviewing #782 to ensure this. |
1cd58c0 to
342f8e6
Compare
This commit: Adds `TierStore`, a tiered `KVStore`/`KVStoreSync` implementation that routes node persistence across three storage roles: - a primary store for durable, authoritative data - an optional backup store for a second durable copy of primary-backed data - an optional ephemeral store for rebuildable cached data such as the network graph and scorer TierStore routes ephemeral cache data to the ephemeral store when configured, while durable data remains primary+backup. Reads and lists do not consult the backup store during normal operation. For primary+backup writes and removals, this implementation treats the backup store as part of the persistence success path rather than as a best-effort background mirror. Earlier designs used asynchronous backup queueing to avoid blocking the primary path, but that weakens the durability contract by allowing primary success to be reported before backup persistence has completed. TierStore now issues primary and backup operations together and only returns success once both complete. This gives callers a clearer persistence guarantee when a backup store is configured: acknowledged primary+backup mutations have been attempted against both durable stores. The tradeoff is that dual-store operations are not atomic across stores, so an error may still be returned after one store has already been updated. TierStore also implements `KVStoreSync` in terms of dedicated synchronous helpers that call the wrapped stores' sync interfaces directly. This preserves the inner stores' synchronous semantics instead of routing sync operations through a previously held async runtime. Additionally, adds unit coverage for the current contract, including: - basic read/write/remove/list persistence - routing of ephemeral data away from the primary store - backup participation in the foreground success path for writes and removals
Add native builder support for tiered storage by introducing `TierStoreConfig`, backup and ephemeral store configuration methods, and a `build_with_dynstore` path that wraps the provided store as the primary tier and applies any configured secondary tiers. This makes tiered storage a builder concern while keeping the Rust `build_with_store` API ergonomic for native callers. Note: The temporary `dead_code` allowance will be removed in a follow-up commit once the new tier-store builder APIs are exercised.
Add UniFFI-facing store abstractions and builder APIs so foreign-language callers can configure backup and ephemeral stores when constructing nodes with custom storage. This introduces `FfiDynStoreTrait` as an FFI-safe equivalent of `DynStoreTrait`, along with a Rust-side adapter that bridges foreign store implementations into the internal dynamic store abstraction used by the builder. As part of this change: - add UniFFI bindings for custom primary, backup, and ephemeral stores - expose `Builder::set_backup_store`, `Builder::set_ephemeral_store`, and `Builder::build_with_store` on the FFI surface - route FFI-backed builder construction through the native dyn-store path - move FFI IO-related types into a dedicated module - preserve per-key write ordering across the FFI boundary - route Rust-side sync access through the async mutation path to avoid runtime-sensitive locking behavior
Extend the Rust test harness to support tiered store configurations and add integration coverage for routing data across primary, backup, and ephemeral stores. This introduces tier-store-aware test helpers, including support for configuring separate stores per node and inspecting persisted data across native and UniFFI-enabled test builds. Add an integration test covering the tiered-storage channel lifecycle and verifying that: - durable node data is persisted to both the primary and backup stores - ephemeral-routed data is stored in the ephemeral tier - ephemeral data is not mirrored back into the durable tiers Also update existing Rust test call sites and benchmark helpers to match the new tier-store-aware setup APIs.
Add a Python in-memory KV store implementation for the UniFFI-backed builder APIs and extend the Python test suite with tiered-storage coverage. This adds a tier-store test that builds nodes with custom primary, backup, and ephemeral stores through the FFI surface and exercises a full channel lifecycle against them. The test verifies that: - primary-backed data is persisted to both the primary and backup stores - ephemeral-routed data is stored in the ephemeral store - backup contents converge to the primary store contents
342f8e6 to
19a6c09
Compare
What this PR does
In this PR we introduce
TierStore, a three-tiered (KVStore+KVStoreSync) implementation that manages data across three distinct storage layers based on criticality.Background
As we have moved towards supporting remote storage with
VssStore, we need to recognize that not all data has the same storage requirements. Currently, all data goes to a single store which creates some problems:This PR proposes tiered storage that provides granular control over where different data types are stored. The tiers include:
Additionally, we also permit the configuration of
Nodewith tiered storage allowing callers to:Nodewith a primary store.These configuration options also extend to our foreign interface, allowing bindings target to build the
Nodewith their own (KVStore+KVStoreSync) implementations. A sample Python implementation is provided and tested.Concerns
VssStorehas built-in retry logic. Wrapping it inTierStorecreates nested retries.KVStoreto the FFIRelated Issues