The Zero Allocations project #18849

jb55 · 2020-05-02T07:40:30Z

This is an octomerge staging PR for tracking various heap-allocation reductions during IBD and regular use. Reducing heap allocations improves cache coherency and should improve performance. Maybe. You can use this PR to bench IBD.

ZAP1 - compressor: use a prevector in CompressScript serialization [ZAP1] #18847 - compressor: use prevector in CompressScript serialization
~~ZAP2 - threadnames: don't allocate memory in ThreadRename #18848 - threadnames: don't allocate memory in ThreadRename~~
ZAP3 - bloom: use Span instead of std::vector for insert and contains [ZAP3] #18985 - bloom: use Span instead of std::vector for insert and contains
ZAP4 - no pr yet - netaddress: return a prevector from CService::GetKey()

DrahtBot · 2020-05-02T12:30:54Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

refactor: Use uint16_t instead of unsigned short #19314 (refactor: Use uint16_t instead of unsigned short by renepickhardt)
Overhaul transaction request logic #19184 (Overhaul transaction request logic by sipa)
p2p: Move all header verification into the network layer, extend logging #19107 (p2p: Refactor, move all header verification into the network layer, without changing behavior by troygiorshev)
Implement ADDRv2 support (part of BIP155) #19031 (Implement ADDRv2 support (part of BIP155) by vasild)
Refactoring CHashWriter & Get{Prevouts,Sequence,Outputs}Hash to SHA256 #18071 (Refactoring CHashWriter & Get{Prevouts,Sequence,Outputs}Hash to SHA256 by JeremyRubin)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

practicalswift · 2020-05-02T15:38:41Z

Would be interesting to see this benchmarked to be able to reason about risk vs reward (where the ideal reward would be a significant measurable and user-perceivable performance improvement) :)

elichai · 2020-05-04T11:09:06Z

I'd be very interested to see the benchmarks of this :)
I've tried doing similar things in the past, but C++ makes this somewhat scary for a big project with lots of contributors because it requires more careful tracking of lifetimes.
Also a few full IBD profiles I've done in the past show IIRC ~10-15% of local IBD time is spent on new and delete.

src/netaddress.cpp

jb55 · 2020-05-04T15:26:18Z

One of the hottest allocations that I found that might be the "easiest" to take a stab at is:

cacheCoins unordered_map in CCoinsViewCache::AddCoin:

typedef std::unordered_map<COutPoint, CCoinsCacheEntry, SaltedOutpointHasher> CCoinsMap

This picture shows that (looks like the cacheCoins.emplace call in AddCoin) is doing 60 million dynamic allocations for this small heaptrack snapshot of IBD (I think I ran this one for 5-10 minutes...?). With peak memory usage of 202.5MB.

One thing I'm trying now is a custom arena allocator for this map to see if that helps.

jb55 · 2020-05-04T15:38:50Z

Also a few full IBD profiles I've done in the past show IIRC ~10-15% of local IBD time is spent on new and delete.

This is a bit higher than I expected, but yes this is why gamedevs typically ban dynamic allocations from their main game loop, it absolutely kills performance: heap fragmentation, cache incoherency, etc, etc. Most high performing games arena-allocate or use custom allocators. I'm going into this from a gamedev mindset, I see IBD as our main loop. Let's minimize time spent per frame and maximize FPS (in our case, blocks/txs per second 😅 ). If we can arena/custom allocate some of these hotspots it might help a bunch.

sipa · 2020-05-04T22:17:34Z

@jb55 The CCoinsMap is where the actual UTXO cache is stored - it should be one allocation per UTXO. That's a huge number, because we cache a huge number of UTXOs, but I don't expect it will be easy to reduce. IBD performance heavily relies on being able to delete cache entries if they're created and spent without a flush to disk in between, which seems incompatible with arena allocation.

jb55 · 2020-05-05T09:21:32Z

@jb55 The CCoinsMap is where the actual UTXO cache is stored - it should be one allocation per UTXO. That's a huge number, because we cache a huge number of UTXOs, but I don't expect it will be easy to reduce.

yes in this case the only thing we could do better is a custom allocator that maintains an efficient memory layout for cache friendliness, This is something I wanted to try as well in addition to reducing allocations where possible.

elichai · 2020-05-05T10:10:04Z

We could also replace the hashmap with one that preserves cache locality, like SwissTable (CC #16718)

jb55 · 2020-05-05T18:23:55Z

re: allocator, looks like there is some prior art here: #16801

This matches a change in the C++20 std::span proposal.

This prevents constructing a Span<A> given two pointers into an array of B (where B is a subclass of A), at least without explicit cast to pointers to A.

jamesob · 2020-05-20T19:23:31Z

I'm Concept ACK and generally in favor of the thought behind this project, but my reindex benchmarks show that runtime is indistinguishable from this branch's master merge-base. These changes still may be worth pursuing for other reasons, but just wanted to add some data.

sipa · 2020-05-20T19:28:38Z

@jamesob Unless you're running with -par=1, I believe there is no change in behaviour (and even then, it's probably trivial).

jb55 · 2020-05-20T20:22:53Z

@jamesob thanks for running that, is there an easy way to run these myself as I hack on this branch? I'm guessing most perf would be dominated by IO factors/secp? Another motivating factor for this PR was to investigate heap usage in general. right now my core node uses a gig of ram and I'm looking at ways to reduce that outside of IBD.

jb55 · 2020-05-20T20:45:36Z

actually, I'm just going to try to reproduce this locally: pruned IBD in-memory filesystem, and I'll just run it a bunch of times up to some height and then compare the time-to-height.

jb55 · 2020-05-20T21:40:16Z

looks like the current changes don't affect IBD much at all:

-datadir=/run/bitcoin -reindex

prune=550
printtoconsole=1
connect=127.0.0.2
noassumevalid=1

reindex master to height 100,000 (ms)

10700
10768
11017
11016
10852

reindex zeroalloc to height 100,000 (ms)

10846
10796
10826
10817
10764

looks like I'll have to try harder!

but now I have a pretty good testbed with this bpftrace script + linux userspace tracepoints:

https://jb55.com/s/ibd.bt.txt
https://jb55.com/s/tracepoints.patch.txt

jamesob · 2020-05-21T00:48:15Z

reindex zeroalloc to height 100,000 (ms)

Consider that testing reindex or IBD up to height 100,000 is uncharacteristic because it happens almost instantaneously relative to the rest of the chain. You may find my previous work in bitcoinperf helpful, if not a little hard to set up (happy to try and improve this with you). Here's an example and accompanying output of benching a subset of IBD within a meaningful chain region.

jb55 · 2020-05-21T01:26:16Z

@jamesob I tried a bit higher and started to see what might be a performance improvement: #18847 (comment)

The only IBD focused commit so far is ZAP1 linked above, I'm still looking into utxo heap improvements which might be more interesting, as it is the # 1 allocator during IBD.

…key-prevector' into zeroalloc

DrahtBot · 2020-06-18T13:19:18Z

🐙 This pull request conflicts with the target branch and needs rebase.

…tion [ZAP1] 83a425d compressor: use a prevector in compressed script serialization (William Casarin) Pull request description: This function was doing millions of unnecessary heap allocations during IBD. I'm start to catalog unnecessary heap allocations as a pet project of mine: as-zero-as-possible-alloc IBD. This is one small step. before: ![May01-174536](https://user-images.githubusercontent.com/45598/80850964-9a38de80-8bd3-11ea-8eec-08cd38ee1fa1.png) after: ![May01-174610](https://user-images.githubusercontent.com/45598/80850974-a91f9100-8bd3-11ea-94a1-e2077391f6f4.png) ~should I type alias this?~ *I type aliased it* This is a part of the Zero Allocations Project #18849 (ZAP1). This code came up as a place where many allocations occur. ACKs for top commit: Empact: ACK 83a425d elichai: tACK 83a425d sipa: utACK 83a425d Tree-SHA512: f0ffa6ab0ea1632715b0b76362753f9f6935f05cdcc80d85566774401155a3c57ad45a687942a1806d3503858f0bb698da9243746c8e2edb8fdf13611235b0e0

jamesob · 2021-09-20T22:26:20Z

Sad to see this closed; I love the idea of exploring how we can reduce allocations in general since it seems like (in addition to any incidental performance benefits) fewer allocations probably makes behavior more predictable... although I guess while we might be able to reasonably preallocate certain things (coinsCache come to mind) others (the block index) might not be possible/desirable, so maybe "Zero Allocations" is somewhat of a misnomer.

JeremyRubin · 2021-09-21T02:11:21Z

it's generally really hard to get changes like this through; e.g. personally my epoch mempool stuff also has an aim to eliminate short lived allocations in many of the mempool algorithms but it is (from my perspective) stalled out for lack of sufficient review.

If big refactorings like ZAP or the Mempool Project are going to make the kind of headway you'd want to see they probably would need to see prioritization during a specific release cycle in order for contributors like myself or @jb55 (who I can't speak for as to why he closed it) in order for it to feel worthwhile investing effort on rebasing/keeping alive... I know that there's also a difficult balance, contributors with full-time or part-time jobs outside of Bitcoin Core are probably often more writing code than reviewing other's code (i.e., to scratch one's itch or fix a specific problem), which leads to less 'review karma' or whatever...

DrahtBot added Consensus P2P RPC/REST/ZMQ Tests Utils/log/libs Validation labels May 2, 2020

DrahtBot mentioned this pull request May 2, 2020

refactor: Replace thread_local use with a mutex-protected map #18851

Closed

elichai reviewed May 4, 2020

View reviewed changes

src/netaddress.cpp Outdated Show resolved Hide resolved

DrahtBot mentioned this pull request May 6, 2020

More thread safety annotation coverage #16127

Merged

DrahtBot added the Needs rebase label May 8, 2020

sipa added 2 commits May 12, 2020 14:12

Make Span size type unsigned

1f790a1

This matches a change in the C++20 std::span proposal.

Make pointer-based Span construction safer

bb3d38f

This prevents constructing a Span<A> given two pointers into an array of B (where B is a subclass of A), at least without explicit cast to pointers to A.

jb55 force-pushed the zeroalloc branch from 26ae7d1 to f4ddb10 Compare May 16, 2020 16:57

DrahtBot mentioned this pull request May 20, 2020

Implement ADDRv2 support (part of BIP155) #19031

Closed

11 tasks

Merge branches '2020-05-compresscript-prevector' and '2020-05-service…

8b88fcb

…key-prevector' into zeroalloc

jb55 force-pushed the zeroalloc branch from f4ddb10 to 8b88fcb Compare May 21, 2020 04:20

DrahtBot mentioned this pull request May 29, 2020

p2p: Move all header verification into the network layer, extend logging #19107

Merged

DrahtBot mentioned this pull request Jun 6, 2020

Overhaul transaction request logic #19184

Closed

DrahtBot mentioned this pull request Jun 17, 2020

refactor: Use uint16_t instead of unsigned short #19314

Merged

DrahtBot added the Needs rebase label Jun 18, 2020

jb55 mentioned this pull request Apr 12, 2021

Discussion: Potential USDTs (User Static Defined Traces) for Core #20981

Closed

jb55 closed this Sep 20, 2021

maflcko added Refactoring Up for grabs and removed Tests RPC/REST/ZMQ P2P Validation Consensus Utils/log/libs labels Sep 21, 2021

bitcoin locked and limited conversation to collaborators Oct 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Zero Allocations project #18849

The Zero Allocations project #18849

jb55 commented May 2, 2020 •

edited

Loading

DrahtBot commented May 2, 2020 •

edited

Loading

practicalswift commented May 2, 2020

elichai commented May 4, 2020

jb55 commented May 4, 2020 •

edited

Loading

jb55 commented May 4, 2020

sipa commented May 4, 2020 •

edited

Loading

jb55 commented May 5, 2020 via email

elichai commented May 5, 2020 •

edited

Loading

jb55 commented May 5, 2020

jamesob commented May 20, 2020

sipa commented May 20, 2020

jb55 commented May 20, 2020

jb55 commented May 20, 2020 •

edited

Loading

jb55 commented May 20, 2020 •

edited

Loading

jamesob commented May 21, 2020

jb55 commented May 21, 2020 •

edited

Loading

DrahtBot commented Jun 18, 2020

jamesob commented Sep 20, 2021

JeremyRubin commented Sep 21, 2021

The Zero Allocations project #18849

The Zero Allocations project #18849

Conversation

jb55 commented May 2, 2020 • edited Loading

DrahtBot commented May 2, 2020 • edited Loading

Conflicts

practicalswift commented May 2, 2020

elichai commented May 4, 2020

jb55 commented May 4, 2020 • edited Loading

jb55 commented May 4, 2020

sipa commented May 4, 2020 • edited Loading

jb55 commented May 5, 2020 via email

elichai commented May 5, 2020 • edited Loading

jb55 commented May 5, 2020

jamesob commented May 20, 2020

sipa commented May 20, 2020

jb55 commented May 20, 2020

jb55 commented May 20, 2020 • edited Loading

jb55 commented May 20, 2020 • edited Loading

jamesob commented May 21, 2020

jb55 commented May 21, 2020 • edited Loading

DrahtBot commented Jun 18, 2020

jamesob commented Sep 20, 2021

JeremyRubin commented Sep 21, 2021

jb55 commented May 2, 2020 •

edited

Loading

DrahtBot commented May 2, 2020 •

edited

Loading

jb55 commented May 4, 2020 •

edited

Loading

sipa commented May 4, 2020 •

edited

Loading

elichai commented May 5, 2020 •

edited

Loading

jb55 commented May 20, 2020 •

edited

Loading

jb55 commented May 20, 2020 •

edited

Loading

jb55 commented May 21, 2020 •

edited

Loading