Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Well-defined CAddress disk serialization, and addrv2 anchors.dat #20516

Merged
merged 3 commits into from
Jun 17, 2021

Conversation

sipa
Copy link
Member

@sipa sipa commented Nov 27, 2020

Alternative to #20509.

This makes the CAddress disk serialization format well defined, and uses it to enable addrv2 support in anchors.dat (in a way that's compatible with older software). The new format is:

  • The first 4 bytes store a format version number. Its low 19 bits are ignored (as those historically stored the CLIENT_VERSION), but its high 13 bits specify the actual serialization:
    • 0x00000000: LE64 encoding for nServices, V1 encoding for CService (like pre-BIP155 network serialization).
    • 0x20000000: CompactSize encoding for nServices, V2 encoding for CService (like BIP155 network serialization).
    • Any other value triggers an unsupported format error on deserialization, and can be used for future format changes.
  • The ADDRV2_FORMAT flag in the stream's version does not determine the actual serialization format; it only sets whether or not V2 encoding is permitted.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 27, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@maflcko maflcko added this to the 22.0 milestone Nov 27, 2020
@sipa sipa changed the title Introduce well-defined CAddress disk serialization Well-defined CAddress disk serialization, and addrv2 anchors.dat support Nov 27, 2020
@sipa sipa changed the title Well-defined CAddress disk serialization, and addrv2 anchors.dat support Well-defined CAddress disk serialization, and addrv2 anchors.dat Nov 27, 2020
@sipa
Copy link
Member Author

sipa commented Nov 27, 2020

I merged the anchors.dat addrv2 support from #20514 into this PR, as doing it correctly requires changes the Cservice serialization from stream version based to stored version based.

@hebasto
Copy link
Member

hebasto commented Nov 27, 2020

Concept ACK.

From the PR description and quick code reading it follows that there is no hurry to backport these changes into 0.21, right?

@sipa
Copy link
Member Author

sipa commented Nov 27, 2020

Indeed, there is probably no hurry, unless we want to support torv3 anchors in 0.21.

src/addrdb.cpp Outdated Show resolved Hide resolved
Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit 6d16903 message:

These do no introduce incompatibilities, as no code versions exist that write
any value other than 0 or 0x20000000 in the top 19 bits, and no code paths
where the stream's version differs from the stored version.

s/top 19/high 13/ ?

src/protocol.h Outdated Show resolved Hide resolved
src/protocol.h Outdated Show resolved Hide resolved
src/protocol.h Show resolved Hide resolved
@sipa sipa force-pushed the 202011_disk_addr branch 2 times, most recently from 22b8c55 to 80f5c54 Compare November 30, 2020 21:40
@sipa
Copy link
Member Author

sipa commented Nov 30, 2020

Addressed comments.

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK 80f5c54

src/protocol.h Outdated Show resolved Hide resolved
src/protocol.h Outdated Show resolved Hide resolved
src/protocol.h Outdated Show resolved Hide resolved
src/protocol.h Outdated Show resolved Hide resolved
src/protocol.h Outdated Show resolved Hide resolved
src/protocol.h Outdated Show resolved Hide resolved
src/addrdb.cpp Show resolved Hide resolved
src/protocol.h Outdated Show resolved Hide resolved
src/protocol.h Show resolved Hide resolved
sipa added 3 commits May 24, 2021 18:06
Before this commit, CAddress disk serialization was messy. It stored
CLIENT_VERSION in the first 4 bytes, optionally OR'ed with ADDRV2_FORMAT.
 - All bits except ADDRV2_FORMAT were ignored, making it hard to use for actual
   future format changes.
 - ADDRV2_FORMAT determines whether or not nServices is serialized in LE64
   format or in CompactSize format.
 - Whether or not the embedded CService is serialized in V1 or V2 format is
   determined by the stream's version having ADDRV2_FORMAT (as opposed to the
   nServices encoding, which is determined by the disk version).

To improve the situation, this commit introduces the following disk
serialization format, compatible with earlier versions, but better defined for
future changes:
 - The first 4 bytes store a format version number. Its low 19 bits are ignored
   (as it historically stored the CLIENT_VERSION), but its high 13 bits specify
   the serialization exactly:
   - 0x00000000: LE64 encoding for nServices, V1 encoding for CService
   - 0x20000000: CompactSize encoding for nServices, V2 encoding for CService
   - Any other value triggers an unsupported format error on deserialization,
     and can be used for future format changes.
 - The ADDRV2_FORMAT flag in the stream's version does not impact the actual
   serialization format; it only determines whether V2 encoding is permitted;
   whether it's actually enabled depends solely on the disk version number.

Operationally the changes to the deserializer are:
 - Failure when the stored format version number is unexpected.
 - The embedded CService's format is determined by the stored format version
   number rather than the stream's version number.

These do no introduce incompatibilities, as no code versions exist that write
any value other than 0 or 0x20000000 in the top 13 bits, and no code paths
where the stream's version differs from the stored version.
@sipa sipa force-pushed the 202011_disk_addr branch from 80f5c54 to f8866e8 Compare May 25, 2021 01:06
@sipa
Copy link
Member Author

sipa commented May 25, 2021

Rebased, and addressed comments.

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK f8866e8

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK f8866e8 tested rebased to master and built/run/restarted with DEBUG_ADDRMAN, peers.dat and anchors ser/deser seems fine

src/protocol.h Show resolved Hide resolved
Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK

@sipa
Copy link
Member Author

sipa commented Jun 12, 2021

@hebasto I think you need to leave a new review.

@hebasto hebasto self-requested a review June 12, 2021 21:46
@hebasto hebasto dismissed their stale review June 12, 2021 21:49

comment resolved

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK f8866e8, tested on Linux Mint 20.1 (x86_64).

Checked that changes in anchors.dat are forward compatible. They are not backward compatible -- getting "ERROR: DeserializeDB: Deserialize or I/O error - CAutoFile::read: end of file: iostream error", but this is expected.

uint32_t stored_format_version = DISK_VERSION_INIT;
if (s.GetVersion() & ADDRV2_FORMAT) stored_format_version |= DISK_VERSION_ADDRV2;
READWRITE(stored_format_version);
stored_format_version &= ~DISK_VERSION_IGNORE_MASK; // ignore low bits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

This is the only place (besides two assertions above) where DISK_VERSION_IGNORE_MASK is used, and it is inverted. Isn't it more clear:

Suggested change
stored_format_version &= ~DISK_VERSION_IGNORE_MASK; // ignore low bits
stored_format_version &= DISK_VERSION_MASK; // ignore low bits

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do if I retouch.

@achow101
Copy link
Member

ACK f8866e8

@laanwj
Copy link
Member

laanwj commented Jun 17, 2021

Code review ACK f8866e8

@laanwj laanwj merged commit 7b45c5e into bitcoin:master Jun 17, 2021
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jun 18, 2021
…drv2 anchors.dat

f8866e8 Add roundtrip fuzz tests for CAddress serialization (Pieter Wuille)
e2f0548 Use addrv2 serialization in anchors.dat (Pieter Wuille)
8cd8f37 Introduce well-defined CAddress disk serialization (Pieter Wuille)

Pull request description:

  Alternative to bitcoin#20509.

  This makes the `CAddress` disk serialization format well defined, and uses it to enable addrv2 support in anchors.dat (in a way that's compatible with older software). The new format is:
  - The first 4 bytes store a format version number. Its low 19 bits are ignored (as those historically stored the `CLIENT_VERSION`), but its high 13 bits specify the actual serialization:
    - 0x00000000: LE64 encoding for `nServices`, V1 encoding for `CService` (like pre-BIP155 network serialization).
    - 0x20000000: CompactSize encoding for `nServices`, V2 encoding for `CService` (like BIP155 network serialization).
    - Any other value triggers an unsupported format error on deserialization, and can be used for future format changes.
  - The `ADDRV2_FORMAT` flag in the stream's version does not determine the actual serialization format; it only sets whether or not V2 encoding is permitted.

ACKs for top commit:
  achow101:
    ACK f8866e8
  laanwj:
    Code review ACK f8866e8
  vasild:
    ACK f8866e8
  jonatack:
    ACK f8866e8 tested rebased to master and built/run/restarted with DEBUG_ADDRMAN, peers.dat and anchors ser/deser seems fine
  hebasto:
    ACK f8866e8, tested on Linux Mint 20.1 (x86_64).

Tree-SHA512: 3898f8a8c51783a46dd0aae03fa10060521f5dd6e79315fe95ba807689e78f202388ffa28c40bf156c6f7b1fc2ce806b155dcbe56027df73d039a55331723796
@maflcko
Copy link
Member

maflcko commented Jun 21, 2021

review ACK f8866e8 🕑

Show signature and timestamp

Signature:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

review ACK f8866e8c324be3322fa507c2ceb1de35d148d0f1 🕑
-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUipdwwAnD4R+92A+ogj6ZHCw+VUa2T+wnnhhGhfQ54cW2Hldl9pS1qTe7zPaNy8
XDn9OzCwU8LHNOWlqZo+e+lRiL09CVW+E7dwwe13ioGrk7wkXw/yO+XfmWNvx/np
X3sOCI8sNq+p1UHVhJLaLtPyuTa3/3/s5Vi157KUjqeBRaQZfTuFXbs6QgWzzjFb
K7KIhN+LINgI0zPmJU2OBrkTvkg+HDNCDW4DWhFFR3AbAdszEKZtYzz9flZIG3X0
LOP/iRBAud+o62YYazGo86iGla7LR9cToVKkOOQdTgPRT5diFmFnHxvRVizobLPb
TfmA9BqB/Ws7wg6h9A6kqAN/B1O9S0uAyPoJhdsTZaQTNt7ouB+q/20jwaVwX61r
0DZ5jhU0VgAkh4II1CH65oYTespndkvDvejjggyK7qFNFfUhKY7zePaMbmUu5eub
NKRkX6EN6GrKqP8dD26yMkPSXov0WjIZ89ebTI8joac8brNmX1TZMe70JYUdhgl/
IFgK8gnv
=Ldtq
-----END PGP SIGNATURE-----

Timestamp of file with hash 71c25ee05aade76e5311b5e5ffa61e3c8b280b51a49049dba2025a6d50dfe77b -

@@ -251,9 +251,37 @@ FUZZ_TARGET_DESERIALIZE(messageheader_deserialize, {
DeserializeFromFuzzingInput(buffer, mh);
(void)mh.IsCommandValid();
})
FUZZ_TARGET_DESERIALIZE(address_deserialize, {
FUZZ_TARGET_DESERIALIZE(address_deserialize_v1_notime, {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied the fuzz inputs in commit bitcoin-core/qa-assets@836513a, so that the targets have something nice to start with.

gwillen pushed a commit to ElementsProject/elements that referenced this pull request Jun 1, 2022
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants