-
Notifications
You must be signed in to change notification settings - Fork 36.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster mempool: merging & postprocessing of linearizations #30285
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
09aa7a8
to
7ab4c8e
Compare
5a7b77c
to
b9ae506
Compare
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
c2acefe
to
df165d3
Compare
3174292
to
20ea5d8
Compare
20ea5d8
to
f4a183c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did not yet review implementation of PostLinearize, didn't verify benchmarks with optimizations
|
||
FUZZ_TARGET(clusterlin_postlinearize_moved_leaf) | ||
{ | ||
// Verify that taking an existing linearization, and moving a leaf to the back, potentially |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this specifically for prioritisetransaction or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it means that a particular approach for RBF will never worsen single-transaction leaf replacements that do not change the size of a transaction. I've added a comment to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my initial thought but the size isn't changing so I'm unsure how it would map to an RBF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"laboratory conditions" I guess. It's just a nice property that holds; even if the conditions for it don't exactly hold often in reality, it probably means they're not far off. I'll look into whether I can generalize it a bit to support changing size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add on to that -- at one point in an earlier draft of #28676, we were seeing examples of RBFs where a single transaction was being replaced with one that had identical size, identical parents, but higher fee, yet due to quirks in how linearization worked the replacement was being rejected because the diagram wasn't improving.
We've since generalized this concern around accidentally discovering a better linearization for a cluster while processing a potential replacement, and I think we will be addressing this in a more robust way, but it's nice that PostLinearization solves the specific problem we observed in the wild.
f4a183c
to
157464f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code review ACK 157464f, will fuzz
} | ||
if (parents.Any()) depgraph_tree.AddDependency(parents.First(), i); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to test my understanding: at this point, the graph we've constructed may not be connected, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly, for two reasons:
depgraph_gen
may not be connected to begin with (there is noMakeConnected(depgraph_gen)
; perhaps there should be?)- Even if
depgraph_gen
is connected (imagine it being a trellis for example), removing all but the first parent, or all but the first child, may split it up.
|
||
FUZZ_TARGET(clusterlin_postlinearize_moved_leaf) | ||
{ | ||
// Verify that taking an existing linearization, and moving a leaf to the back, potentially |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add on to that -- at one point in an earlier draft of #28676, we were seeing examples of RBFs where a single transaction was being replaced with one that had identical size, identical parents, but higher fee, yet due to quirks in how linearization worked the replacement was being rejected because the diagram wasn't improving.
We've since generalized this concern around accidentally discovering a better linearization for a cluster while processing a potential replacement, and I think we will be addressing this in a more robust way, but it's nice that PostLinearization solves the specific problem we observed in the wild.
ACK 157464f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 157464f
I didn't verify benchmark timings but opts seemed ok
assert(cmp >= 0); | ||
|
||
// The chunks that come out of postlinearizing are always connected. | ||
LinearizationChunking linchunking(depgraph, post_linearization); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds like a tasty cantonese restaurant
src/cluster_linearize.h
Outdated
* Specifically, this finds the connected component which contains the first transaction of | ||
* todo (if any). | ||
* | ||
* Two transactions are considered connected if there is a path from one to the other inside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Two transactions are considered connected if there is a path from one to the other inside | |
* Two transactions are considered connected if there is a path from one to the other and both are inside |
?
in general this description and rewrite was very helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not just the two transactions, but all transactions in the path need to be in todo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rewritten this. Please have a look if it's better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely clearer on what the "path" is 👍 (and matches my understanding of the code now)
depgraph.FeeRate(lin_leaf.back()).fee += fee_inc; | ||
auto new_chunking = ChunkLinearization(depgraph, lin_moved); | ||
auto cmp = CompareChunks(new_chunking, old_chunking); | ||
assert(cmp >= 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice to have a non-tree example of things strictly improving, also interesting to see that improvements still happen even if no fees are added
edit: hm, still not quite showing strict improvement since fees are after the fact; it's a different "cluster". Oh well.
assert(cmp >= 0); | |
if (fee_inc > 0) { | |
// It's more fees; should be superior | |
assert(cmp > 0); | |
} else { | |
assert(cmp >= 0); | |
} |
Here are my linearization benchmarks (ryzen 7995wx) after this PR (compare with #30126 (comment)):
|
This makes it clearer what the function does.
Add utility functions to DepGraph for finding connected components.
When the transactions being marked done exactly match the first chunk of what remains of the linearization, we can just remember to skip that chunk instead of computing a full rechunking. Further, chop off prefixes of the input linearization that are already done, so they don't need to be reconsidered for further rechunkings.
157464f
to
bbcee5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code review ACK bbcee5a
// During an even pass, the diagram above would correspond to linearization [2,3,0,1], with | ||
// groups [2] and [3,0,1]. | ||
|
||
std::vector<TxEntry> entries(linearization.size() + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we persisting entries
between passes just so we don't need to reallocate this vector? I don't see that we need to keep any of the information from a previous pass. To check, added a entries.clear()
which didn't seem to break anything for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the only reuse is the vector allocation.
* | ||
* Postlinearization guarantees: | ||
* - The resulting chunks are connected. | ||
* - If the input has a tree shape (either all transactions have at most one child, or all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a subset of the "there exists a maximum of 1 path from any node to another" definition of a tree, right? This definition seems to be like a... botanical tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://en.wikipedia.org/wiki/Tree_(graph_theory), a transaction graph where each transaction has at most one child would be an arborescence or out-tree, and the opposite an anti-arborescence or in-tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK bbcee5a
// - Each direction corresponds to one shape of tree being linearized optimally (forward passes | ||
// guarantee this for graphs where each transaction has at most one child; backward passes | ||
// guarantee this for graphs where each transaction has at most one parent). | ||
// - Starting with a backward pass guarantees the moved-tree property. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved-leaf? (or point to another usage of moved tree)
ACK bbcee5a |
Part of cluster mempool: #30289
Depends on #30126, and was split off from it. #28676 depends on this.
This adds the algorithms for merging & postprocessing linearizations.
The
PostLinearize(depgraph, linearization)
function performs an in-place improvement oflinearization
, using two iterations of the Linearization post-processing algorithm. The first running from back to front, the second from front to back.The
MergeLinearizations(depgraph, linearization1, linearization2)
function computes a new linearization for the provided cluster, given two existing linearizations for that cluster, which is at least as good as both inputs. The algorithm is described at a high level in merging incomparable linearizations.For background and references, see Introduction to cluster linearization.