-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP txHandler: do not rebroadcast to peers sent duplicate messages #5424
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5424 +/- ##
==========================================
- Coverage 55.60% 50.43% -5.18%
==========================================
Files 447 447
Lines 63395 63422 +27
==========================================
- Hits 35253 31986 -3267
- Misses 25760 28931 +3171
- Partials 2382 2505 +123 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One opportunity to use Eventually
, but up to you if you take it.
Fixed outstanding review comments and merged master in |
data/txDupCache.go
Outdated
vals, found = c.cur[*d] | ||
if found { | ||
if _, senderFound = vals.Load(sender); senderFound { | ||
return d, vals, true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing a test for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is hard to test since the value need to appear in a current page between read write locks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Many more locks, map lookups, code complexity and memory footprint added for possibly very limited gain.
Will be nice to evaluate this tradeoff thoroughly and be prepare to reverse this if it is not sufficiently helpful.
A current delay between accepting and and relaying is:
I.e. there is some delay but might not be enough to collect enough duplicates to make re-broadcasting filtering effective. |
Do not merge - the benchmark showed 6.9k tps vs 7.7k tps on master, so networking code appears to be much slower now. |
Given the performance impact would it be possible to have this behavior be optional? Like keep the original txSaltedCache with the old struct{} value type, and then a new txPeerTrackingCache type that could be optionally enabled? |
@@ -315,7 +315,7 @@ func (p *digestCachePusher) push() { | |||
func (p *saltedCachePusher) push() { | |||
var d [crypto.DigestSize]byte | |||
crypto.RandBytes(d[:]) | |||
p.c.CheckAndPut(d[:]) // saltedCache hashes inside | |||
p.c.CheckAndPut(d[:], struct{}{}) // saltedCache hashes inside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize this before, but your benchmark is only checking the raw performance of totally unique txns, with no duplicates, which is a synthetic scenario good for performance comparison but the real workload will probably have at least a few duplicates (and peers) per digest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, updated the benchmark and posted results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe give sync.Map a pointer to an object rather than struct{}{}? I can't imagine how it will convert that to a map key
data/txDupCache.go
Outdated
if found { | ||
return d, found | ||
if _, senderFound = vals.Load(sender); senderFound { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to have all these checks for whether you've seen the sender already? Or can we just optimistically call Store again for that case? it seems like this would simplify the code a bit, e.g.
func (c *txSaltedCache) CheckAndPut(msg []byte, sender network.Peer) (d *crypto.Digest, vals *sync.Map, found bool) {
c.mu.RLock()
d, vals, _, found = c.innerCheck(msg)
c.mu.RUnlock()
salt := c.curSalt
// fast read-only path: assuming most messages are duplicates, hash msg and check cache
if found {
vals.Store(sender, struct{}{}) // record the sender for this txn
return d, vals, true
}
// not found: acquire write lock to add this msg hash to cache
c.mu.Lock()
defer c.mu.Unlock()
// salt may have changed between RUnlock() and Lock(), rehash if needed
if salt != c.curSalt {
d, vals, _, found = c.innerCheck(msg)
if found {
// already added to cache between RUnlock() and Lock(), return
vals.Store(sender, struct{}{}) // record the sender for this txn
return d, vals, true
}
} else { // not found or found in cur page
// Do another check to see if another copy of the transaction won the race to write it to the cache
// Only check current to save a lookup since swap is handled in the first branch
vals, found = c.cur[*d]
if found {
vals.Store(sender, struct{}{}) // record the sender for this txn
return d, vals, true
}
}
if len(c.cur) >= c.maxSize {
c.innerSwap()
ptr := saltedPool.Get()
defer saltedPool.Put(ptr)
buf := ptr.([]byte)
toBeHashed := append(buf[:0], msg...)
toBeHashed = append(toBeHashed, c.curSalt[:]...)
toBeHashed = toBeHashed[:len(msg)+len(c.curSalt)]
dn := crypto.Digest(blake2b.Sum256(toBeHashed))
d = &dn
}
vals = &sync.Map{}
vals.Store(sender, struct{}{}) // record the sender for this txn
c.cur[*d] = vals
return d, vals, false
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll benchmark this version vs Load + Store I have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BenchmarkDigestCaches/data.digestCacheMaker/threads=1-8 944719 1435 ns/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=1-8 585506 2342 ns/op
BenchmarkDigestCaches/data.saltedCacheDupMaker/threads=1-8 568954 2254 ns/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=4-8 1327412 893.9 ns/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=4-8 495157 2733 ns/op
BenchmarkDigestCaches/data.saltedCacheDupMaker/threads=4-8 627554 2041 ns/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=16-8 1287717 1023 ns/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=16-8 382932 3128 ns/op
BenchmarkDigestCaches/data.saltedCacheDupMaker/threads=16-8 981458 1280 ns/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=128-8 1307422 988.9 ns/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=128-8 406440 3055 ns/op
BenchmarkDigestCaches/data.saltedCacheDupMaker/threads=128-8 1475484 787.5 ns/op
idk, kind of the same.
The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys.
It appears to be a sharded map hashmap, and according to (1) it is better to read rather then rewrite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was, the only reason to do the Load() before Store() is to optimize for the case where the same peer gives you same the transaction multiple times, which seems unlikely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely not the case that "multiple goroutines read, write, and overwrite entries for disjoint sets of keys" in your benchmarks.
However in practice, assuming every peer sends the same txn once, and there are 20 handlers that are randomly assigned to the txns from different peers, this is more likely to be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sync.Map.Store
actually calls Load
at the very beginning so in makes sense to call Store directly.
But anyway, allocation new sync.Map per new txn appears to be a main contributor to the TPS slowdown.
Summary
When the transaction deduplication was added it had a side effect of re-sending transactions to peers sent a message the handler has seen before. This PR fixes this.
Before dedup if A sent T, all but A get T, then, if B sent us T, noone gets T again
With dedup, if A sent T, all but A get T, then, if B sent us T, noone gets T again.
With this PR: if A and B sent T almost simultaneously all but A and B get T
Implementation overview:
txSaltedCache
how stores map of seen peers for a particular messageCheckAndPut
returns this map to to get rid additional lookups.RelayArray
to let networking know whom to skip.Implementation Details
CheckAndPut
is not more complex because it needs to update the map value even if there is a match.The original idea with fast check under a read lock is preserved but
innerCheck
also returns a current value and a "page" (cur/prev map) where it was found in order to update. NoteinnerCheck
can return prev page and this also need to be considered when running an update with a write lock taken.The
found && senderFound
denotes a new fast path without modification underlying maps.Note, a reference data struct (
*sync.Map
in this case) is crucial to have the implementation to work because of the following scenario:txBacklogMsg
(wi
) of this transaction.wi
is updated automatically.Test Plan
Added a new unit test
Benchmarks
master
feature