Description
Use-cases of a node from an external application's point of view fall into two categories: inspection and notification. Light clients, which sync using only the chain of headers and do not generally validate extrinsic data within the block, must have special considerations to ensure they are able to provide both use-cases efficiently.
Inspection (of storage or the chain) is pretty easily provided following the design of Ethereum and Bitcoin before it: full nodes (or even light-nodes that already have the requisite information) may provide proofs on the value of a particular key in storage using only the storage's Merkle-trie root as a priori assumption (which is provided for through a header-sync).
However, notification is more difficult to provide as a low-trust service since proof that a given block does not have an extrinsic which causes a state-change of interest is not efficiently derivable from the storage's Merkle-trie roots. In Ethereum this was addressed through collating a large (2KB) Bloom filter in each block and embedding it in the header. This bloated the header and was ultimately fruitless as the usage of Ethereum ballooned and the Bloom became saturated.
Instead, I propose three mechanisms for addressing state-change notification on Substrate, two of which (the latter two) it makes sense to combine:
- Track last-modification-time entry in storage;
- Provide a Merkle trie root of all modified storage entries, ordered and indexed;
- Provide a hierarchy of Merkle trie roots of all modified storage entries in a series of blocks, ordered and indexed.
Track last-modification-time entry in storage
At present, the storage database is a set of key-value pairs. These pairs are arranged into a Merkle trie and baked into a single "root" hash. This proposal would simply prefix the value with the block number at which the value was last modified.
Synced light-clients could easily query proof-servers on when a storage item of interest last changed. Proof-servers could prove the most recent block (compared to either the head or some block before the head that the light-client knows about) at which the change happened. Light-clients could request a change-log between some begin and end block of one or more storage keys and the proof-server would return a chain of these proofs as irrefutable evidence of all blocks in which one or a number of storage entries changed.
There is one issue with this approach: deleted storage entries would still have a footprint in the database, necessary for recording the block at which it was "last modified" (i.e. deleted) - without this light-clients would lose their ability to query for its historical change log. The (slightly inelegant) workaround to this would be to have special "garbage-collection" blocks in which these zombie entries would be purged from the database (and thus the trie). Light-clients would ensure that they always made at least one change-log request within each of these periods.
This would increase every storage entry in the database by around 32-bytes (for the block number). It wouldn't have much of an effect on the disk i/o and the header size would remain the same. However, for storage with a lot of changes, building and executing these garbage-collection blocks may become a serious efficiently issue.
Ordered, indexed, Merklised per-block change-trie
This proposal creates a new structure that encodes all changes, not dissimilar in spirit to the Bloom filter. This structure takes the form of a trie root build as the mapping of indices to storage keys. The indices are sequential and ordered by storage key. Like the Bloom filter this gives a cryptographic digest of what has changed in the block. Unlike the Bloom filter, a proof that any given key has not changed is not only possible, but also compact.
To provide the proof that a given key didn't change, the proof-server provides the two (Index, Key)
entries on either side of the key to be proven was not changed. (A null sentinel index of (ChangedKeyCount, null)
would denote the upper end in order to provide a proof should the queried key be greater than the upper limit of modified keys.)
In principle, this trie could also contain a second mapping of (Key, [ ExtrinsicIndex_1, ExtrinsicIndex_2, ... ])
to denote which extrinsic data in the block actually caused the key to be changed.
Extending over ranges
While this allows for efficient proofs that one or more keys were not changed (or, if they were changed, can give the specific extrinsics which caused the change) in any given block, use-cases typically want to ascertain this for a range of blocks.
This structure, however, lends itself to a hierarchical approach: event N
blocks, the trie would contain an additional entry ('digest', DigestChangeTrieRoot)
. DigestChangeTrieRoot
would be the root of a similar trie structure, except that it would contain the accumulated modified accounts of the previous N
blocks. Rather than containing the series of ExtrinsicIndex
s that caused the change of any given key, it would contain the block numbers that caused the change, allowing for the efficient identification of the exact extrinsic through a logarithmic number of queries/proofs.
This structure can be nested and recursed arbitrarily; N might reasonably be either 16, 32 or 256 and be recursed 4, 3, or 2 times accordingly to get a maximum block range that the top-level trie covered to be 32768 or 65536.
Light clients would query proof-servers in batches, hopping over the blocks by the top-level range at a time. Proof-servers would either return with proofs that nothing of interest changed or would return with the sub-ranges where something did change (along with the key that changed there). Light-client would re-query within that subrange, drilling down until they determined the exact set of extrinsics. In principle, this entire query could be prepared on the server-side and a compiled proof of everything built and sent back to the light-client with minimal bandwidth/latency used.