Description
We've always kinda winged our gossip protocols, so they're both heavier than required, and maybe do not deliver the desired resilience.
We've different gossip in different places, but an important one being used looks like a grid plus a spamy layer. If I recall, the spamy layer always fires, even if when grid works, which sounds wasteful.
At an intuitive theory level @chenda-w3f and I have now reached some understanding of what should be the relevant properties. In brief..
- randomized typologies improve resistance against eclipse attacks and similar, with
- a locally randomized topology being somewhat better than a globally randomized one,
- but local randomization gives a random regular graph with overlaps which make the gossip less efficient,
- It's supposedly not much less efficient, but every time he quotes me numbers then they sound quite noticeable.
- If you rerandomize a topology by global randomness then faster helps more I guess.
In asymptotics,if we've n nodes then a locally randomized topology with valency O(n^{1/k}) for k=2,3 should have "small" constant diameter d, but small is not d=2,3 like you get from k=2,3 respectively with a deterministic 2d or 3d grid topology. I've now forgotten what Chen said O(log n) yields, but maybe diameter O(log n).
We're largely lacking in concrete numbers here, although sometimes Chen reported various ones from the literature.
We suspect the diameter two given by the 2d grid topology winds up being overkill, so we'd suggest polkadot pay some latency in exchange for a less spamy topology. We propose an experiment in which the gossip topology be defined by the union of two topologies:
-
We define a 3d grid in which every validator sends to 3 n^{1/3} = 30 other validators. We deterministically resort all validators into this grid based upon a context-entropy under which the messages gets sent. Although deterministic, we'll update the context-entropy faster for more sensitive protocols, so this helps more against eclipse there:
- Approval system messages would set context-entropy to be the relay chain VRF output for the relay chain block being discussed.
- Initially grandpa would set the context-entropy to be the epoch randomness, but maybe we could select something faster, maybe post-BEEFY.
-
We send to n^{1/2} = 32ish random other validators selected using system randomness.
At present, our 2d grid sends to 2 n^{1/2} = 64 validators and our spammy layers sends to even more, so this reduces the total message attempts by whatever the spammy layer does.
We'll be adding hops here though so TTL shall matter more, so we should discuss how TTL works in gossip now. I donno yet..
I'd expect two development costs here: Any TTL changes of course, required even for experimental work. All the plumbing required by the new context-entropy input. I think context-entropy being constant makes sense for testing though, which simplifies this considerably.
We'll have @chenda-w3f run some theory simulations of this topology before anyone starts implementation work. It's be helpful if someone could comment here on what the existing topologies really looks like so Chen can compare. We think more about how much context-entropy input helps too.