cluster mempool: merging & postprocessing of linearizations #30285

sipa · 2024-06-14T02:12:38Z

Part of cluster mempool: #30289

Depends on #30126, and was split off from it. #28676 depends on this.

This adds the algorithms for merging & postprocessing linearizations.

The PostLinearize(depgraph, linearization) function performs an in-place improvement of linearization, using two iterations of the Linearization post-processing algorithm. The first running from back to front, the second from front to back.

The MergeLinearizations(depgraph, linearization1, linearization2) function computes a new linearization for the provided cluster, given two existing linearizations for that cluster, which is at least as good as both inputs. The algorithm is described at a high level in merging incomparable linearizations.

For background and references, see Introduction to cluster linearization.

DrahtBot · 2024-06-14T02:12:41Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	glozow, instagibbs, sdaftuar

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#28676 ([WIP] Cluster mempool implementation by sdaftuar)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot · 2024-07-10T22:20:14Z

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

_{Debug: https://github.com/bitcoin/bitcoin/runs/27291659212}

instagibbs

f4a183c

did not yet review implementation of PostLinearize, didn't verify benchmarks with optimizations

src/cluster_linearize.h

src/test/fuzz/cluster_linearize.cpp

instagibbs · 2024-07-30T14:34:45Z

src/test/fuzz/cluster_linearize.cpp

+
+FUZZ_TARGET(clusterlin_postlinearize_moved_leaf)
+{
+    // Verify that taking an existing linearization, and moving a leaf to the back, potentially


is this specifically for prioritisetransaction or something?

No, it means that a particular approach for RBF will never worsen single-transaction leaf replacements that do not change the size of a transaction. I've added a comment to clarify.

That was my initial thought but the size isn't changing so I'm unsure how it would map to an RBF?

"laboratory conditions" I guess. It's just a nice property that holds; even if the conditions for it don't exactly hold often in reality, it probably means they're not far off. I'll look into whether I can generalize it a bit to support changing size.

To add on to that -- at one point in an earlier draft of #28676, we were seeing examples of RBFs where a single transaction was being replaced with one that had identical size, identical parents, but higher fee, yet due to quirks in how linearization worked the replacement was being rejected because the diagram wasn't improving.

We've since generalized this concern around accidentally discovering a better linearization for a cluster while processing a potential replacement, and I think we will be addressing this in a more robust way, but it's nice that PostLinearization solves the specific problem we observed in the wild.

src/test/fuzz/cluster_linearize.cpp

src/cluster_linearize.h

sdaftuar

code review ACK 157464f, will fuzz

src/cluster_linearize.h

sdaftuar · 2024-07-31T15:06:55Z

src/test/fuzz/cluster_linearize.cpp

+            }
+            if (parents.Any()) depgraph_tree.AddDependency(parents.First(), i);
+        }
+    }


Just to test my understanding: at this point, the graph we've constructed may not be connected, right?

Certainly, for two reasons:

depgraph_gen may not be connected to begin with (there is no MakeConnected(depgraph_gen); perhaps there should be?)

Even if depgraph_gen is connected (imagine it being a trellis for example), removing all but the first parent, or all but the first child, may split it up.

sdaftuar · 2024-07-31T15:11:54Z

src/test/fuzz/cluster_linearize.cpp

+
+FUZZ_TARGET(clusterlin_postlinearize_moved_leaf)
+{
+    // Verify that taking an existing linearization, and moving a leaf to the back, potentially


To add on to that -- at one point in an earlier draft of #28676, we were seeing examples of RBFs where a single transaction was being replaced with one that had identical size, identical parents, but higher fee, yet due to quirks in how linearization worked the replacement was being rejected because the diagram wasn't improving.

We've since generalized this concern around accidentally discovering a better linearization for a cluster while processing a potential replacement, and I think we will be addressing this in a more robust way, but it's nice that PostLinearization solves the specific problem we observed in the wild.

src/test/fuzz/cluster_linearize.cpp

sdaftuar · 2024-07-31T20:24:53Z

ACK 157464f

instagibbs

ACK 157464f

I didn't verify benchmark timings but opts seemed ok

src/cluster_linearize.h

instagibbs · 2024-07-31T16:41:37Z

src/test/fuzz/cluster_linearize.cpp

+    assert(cmp >= 0);
+
+    // The chunks that come out of postlinearizing are always connected.
+    LinearizationChunking linchunking(depgraph, post_linearization);


sounds like a tasty cantonese restaurant

instagibbs · 2024-07-31T16:51:29Z

src/cluster_linearize.h

+     * Specifically, this finds the connected component which contains the first transaction of
+     * todo (if any).
+     *
+     * Two transactions are considered connected if there is a path from one to the other inside


Suggested change

* Two transactions are considered connected if there is a path from one to the other inside

* Two transactions are considered connected if there is a path from one to the other and both are inside

?

in general this description and rewrite was very helpful

Not just the two transactions, but all transactions in the path need to be in todo.

I have rewritten this. Please have a look if it's better.

definitely clearer on what the "path" is 👍 (and matches my understanding of the code now)

src/cluster_linearize.h

instagibbs · 2024-07-31T20:32:17Z

src/test/fuzz/cluster_linearize.cpp

+    depgraph.FeeRate(lin_leaf.back()).fee += fee_inc;
+    auto new_chunking = ChunkLinearization(depgraph, lin_moved);
+    auto cmp = CompareChunks(new_chunking, old_chunking);
+    assert(cmp >= 0);


nice to have a non-tree example of things strictly improving, also interesting to see that improvements still happen even if no fees are added
edit: hm, still not quite showing strict improvement since fees are after the fact; it's a different "cluster". Oh well.

Suggested change

assert(cmp >= 0);

if (fee_inc > 0) {

// It's more fees; should be superior

assert(cmp > 0);

} else {

assert(cmp >= 0);

}

sdaftuar · 2024-08-01T15:37:36Z

Here are my linearization benchmarks (ryzen 7995wx) after this PR (compare with #30126 (comment)):

ns/op	op/s	err%	total	benchmark
973.43	1,027,297.78	0.5%	0.01	`LinearizeNoIters16TxWorstCaseAnc`
1,464.85	682,664.84	0.2%	0.01	`LinearizeNoIters16TxWorstCaseLIMO`
2,503.93	399,372.47	0.3%	0.01	`LinearizeNoIters32TxWorstCaseAnc`
5,091.86	196,392.05	0.2%	0.01	`LinearizeNoIters32TxWorstCaseLIMO`
4,881.34	204,861.99	0.1%	0.01	`LinearizeNoIters48TxWorstCaseAnc`
10,880.48	91,907.74	0.0%	0.01	`LinearizeNoIters48TxWorstCaseLIMO`
7,843.67	127,491.30	0.0%	0.01	`LinearizeNoIters64TxWorstCaseAnc`
18,803.07	53,182.80	0.1%	0.01	`LinearizeNoIters64TxWorstCaseLIMO`
11,747.48	85,124.62	0.1%	0.01	`LinearizeNoIters75TxWorstCaseAnc`
28,902.79	34,598.74	0.0%	0.01	`LinearizeNoIters75TxWorstCaseLIMO`
19,030.02	52,548.55	0.1%	0.01	`LinearizeNoIters99TxWorstCaseAnc`
49,663.43	20,135.54	0.1%	0.01	`LinearizeNoIters99TxWorstCaseLIMO`

ns/iters	iters/s	err%	total	benchmark
13.66	73,231,349.71	0.1%	0.01	`LinearizePerIter16TxWorstCase`
9.63	103,869,988.01	0.3%	0.01	`LinearizePerIter32TxWorstCase`
9.38	106,626,518.23	0.3%	0.01	`LinearizePerIter48TxWorstCase`
9.44	105,920,646.37	0.3%	0.01	`LinearizePerIter64TxWorstCase`
10.38	96,365,108.12	0.3%	0.01	`LinearizePerIter75TxWorstCase`
10.38	96,374,023.73	0.2%	0.01	`LinearizePerIter99TxWorstCase`

ns/op	op/s	err%	total	benchmark
690.49	1,448,239.70	0.4%	0.01	`MergeLinearizations16TxWorstCase`
2,703.90	369,835.96	0.1%	0.01	`MergeLinearizations32TxWorstCase`
6,143.66	162,769.31	0.0%	0.01	`MergeLinearizations48TxWorstCase`
11,300.76	88,489.66	0.1%	0.01	`MergeLinearizations64TxWorstCase`
17,576.48	56,894.21	0.0%	0.01	`MergeLinearizations75TxWorstCase`
31,100.42	32,153.91	0.0%	0.01	`MergeLinearizations99TxWorstCase`
278.14	3,595,258.27	0.0%	0.01	`PostLinearize16TxWorstCase`
863.81	1,157,659.26	0.0%	0.01	`PostLinearize32TxWorstCase`
2,657.66	376,271.23	0.2%	0.01	`PostLinearize48TxWorstCase`
4,652.11	214,956.22	0.0%	0.01	`PostLinearize64TxWorstCase`
5,392.31	185,449.25	0.2%	0.01	`PostLinearize75TxWorstCase`
9,431.09	106,032.27	0.1%	0.01	`PostLinearize99TxWorstCase`

src/cluster_linearize.h

src/test/fuzz/cluster_linearize.cpp

This makes it clearer what the function does.

Add utility functions to DepGraph for finding connected components.

When the transactions being marked done exactly match the first chunk of what remains of the linearization, we can just remember to skip that chunk instead of computing a full rechunking. Further, chop off prefixes of the input linearization that are already done, so they don't need to be reconsidered for further rechunkings.

glozow

code review ACK bbcee5a

glozow · 2024-08-02T10:27:28Z

src/cluster_linearize.h

+    // During an even pass, the diagram above would correspond to linearization [2,3,0,1], with
+    // groups [2] and [3,0,1].
+
+    std::vector<TxEntry> entries(linearization.size() + 1);


Are we persisting entries between passes just so we don't need to reallocate this vector? I don't see that we need to keep any of the information from a previous pass. To check, added a entries.clear() which didn't seem to break anything for me.

Indeed, the only reuse is the vector allocation.

glozow · 2024-08-02T10:37:07Z

src/cluster_linearize.h

+ *
+ * Postlinearization guarantees:
+ * - The resulting chunks are connected.
+ * - If the input has a tree shape (either all transactions have at most one child, or all


This is a subset of the "there exists a maximum of 1 path from any node to another" definition of a tree, right? This definition seems to be like a... botanical tree?

According to https://en.wikipedia.org/wiki/Tree_(graph_theory), a transaction graph where each transaction has at most one child would be an arborescence or out-tree, and the opposite an anti-arborescence or in-tree.

src/test/fuzz/cluster_linearize.cpp

instagibbs

ACK bbcee5a

instagibbs · 2024-08-01T20:24:16Z

src/cluster_linearize.h

+    // - Each direction corresponds to one shape of tree being linearized optimally (forward passes
+    //   guarantee this for graphs where each transaction has at most one child; backward passes
+    //   guarantee this for graphs where each transaction has at most one parent).
+    // - Starting with a backward pass guarantees the moved-tree property.


moved-leaf? (or point to another usage of moved tree)

sdaftuar · 2024-08-02T18:40:12Z

ACK bbcee5a

sipa changed the title ~~Low-level cluster linearization code: merging & postprocessing~~ cluster mempool: merging and postprocessing for linearizations Jun 14, 2024

sipa changed the title ~~cluster mempool: merging and postprocessing for linearizations~~ cluster mempool: merging & postprocessing of linearizations Jun 14, 2024

sipa added the Mempool label Jun 14, 2024

This was referenced Jun 14, 2024

cluster mempool: cluster linearization algorithm #30126

Merged

Cluster mempool tracking issue #30289

Open

sipa force-pushed the 202406_clusterlin_meta branch from 09aa7a8 to 7ab4c8e Compare July 2, 2024 20:51

DrahtBot mentioned this pull request Jul 9, 2024

cluster mempool: optimized candidate search #30286

Merged

sipa force-pushed the 202406_clusterlin_meta branch 2 times, most recently from 5a7b77c to b9ae506 Compare July 10, 2024 20:49

DrahtBot added the CI failed label Jul 10, 2024

sipa force-pushed the 202406_clusterlin_meta branch 2 times, most recently from c2acefe to df165d3 Compare July 11, 2024 12:37

DrahtBot removed the CI failed label Jul 11, 2024

DrahtBot mentioned this pull request Jul 11, 2024

[WIP] Cluster mempool implementation #28676

Draft

8 tasks

sipa force-pushed the 202406_clusterlin_meta branch 2 times, most recently from 3174292 to 20ea5d8 Compare July 19, 2024 19:38

DrahtBot added the Needs rebase label Jul 26, 2024

sipa force-pushed the 202406_clusterlin_meta branch from 20ea5d8 to f4a183c Compare July 26, 2024 12:39

DrahtBot removed the Needs rebase label Jul 26, 2024

instagibbs reviewed Jul 30, 2024

View reviewed changes

sipa force-pushed the 202406_clusterlin_meta branch from f4a183c to 157464f Compare July 30, 2024 22:08

DrahtBot added CI failed and removed CI failed labels Jul 30, 2024

sdaftuar reviewed Jul 31, 2024

View reviewed changes

instagibbs approved these changes Jul 31, 2024

View reviewed changes

glozow reviewed Aug 1, 2024

View reviewed changes

src/cluster_linearize.h Show resolved Hide resolved

src/test/fuzz/cluster_linearize.cpp Show resolved Hide resolved

sipa added 5 commits August 1, 2024 14:07

clusterlin: rename Intersect -> IntersectPrefixes

0e52728

This makes it clearer what the function does.

clusterlin: add algorithms for connectedness/connected components

0e2812d

Add utility functions to DepGraph for finding connected components.

clusterlin: add PostLinearize + benchmarks + fuzz tests

4f8958d

clusterlin: add MergeLinearizations function + fuzz test + benchmark

04d7a04

sipa force-pushed the 202406_clusterlin_meta branch from 157464f to bbcee5a Compare August 1, 2024 20:13

glozow reviewed Aug 2, 2024

View reviewed changes

DrahtBot requested review from sdaftuar and instagibbs August 2, 2024 11:14

instagibbs approved these changes Aug 2, 2024

View reviewed changes

glozow merged commit bba01ba into bitcoin:master Aug 5, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster mempool: merging & postprocessing of linearizations #30285

cluster mempool: merging & postprocessing of linearizations #30285

sipa commented Jun 14, 2024 •

edited

Loading

DrahtBot commented Jun 14, 2024 •

edited

Loading

DrahtBot commented Jul 10, 2024

instagibbs left a comment

instagibbs Jul 30, 2024

sipa Jul 30, 2024

instagibbs Jul 30, 2024

sipa Jul 30, 2024

sdaftuar Jul 31, 2024

sdaftuar left a comment

sdaftuar Jul 31, 2024

sipa Aug 1, 2024

sdaftuar Jul 31, 2024

sdaftuar commented Jul 31, 2024

instagibbs left a comment

instagibbs Jul 31, 2024

instagibbs Jul 31, 2024

sipa Aug 1, 2024

sipa Aug 1, 2024

instagibbs Aug 1, 2024 •

edited

Loading

instagibbs Jul 31, 2024 •

edited

Loading

sdaftuar commented Aug 1, 2024

glozow left a comment

glozow Aug 2, 2024

sipa Aug 2, 2024

glozow Aug 2, 2024

sipa Aug 2, 2024

instagibbs left a comment

instagibbs Aug 1, 2024 •

edited

Loading

sdaftuar commented Aug 2, 2024

	* Two transactions are considered connected if there is a path from one to the other inside
	* Two transactions are considered connected if there is a path from one to the other and both are inside

-    assert(cmp >= 0);
+    if (fee_inc > 0) {
+        // It's more fees; should be superior
+        assert(cmp > 0);
+    } else {
+        assert(cmp >= 0);
+    }

cluster mempool: merging & postprocessing of linearizations #30285

cluster mempool: merging & postprocessing of linearizations #30285

Conversation

sipa commented Jun 14, 2024 • edited Loading

DrahtBot commented Jun 14, 2024 • edited Loading

Code Coverage

Reviews

Conflicts

DrahtBot commented Jul 10, 2024

instagibbs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdaftuar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdaftuar commented Jul 31, 2024

instagibbs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

instagibbs Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

instagibbs Jul 31, 2024 • edited Loading

Choose a reason for hiding this comment

sdaftuar commented Aug 1, 2024

glozow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

instagibbs left a comment

Choose a reason for hiding this comment

instagibbs Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

sdaftuar commented Aug 2, 2024

sipa commented Jun 14, 2024 •

edited

Loading

DrahtBot commented Jun 14, 2024 •

edited

Loading

instagibbs Aug 1, 2024 •

edited

Loading

instagibbs Jul 31, 2024 •

edited

Loading

instagibbs Aug 1, 2024 •

edited

Loading