-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics: split out /Transaction/AssembleBlock metrics #4795
metrics: split out /Transaction/AssembleBlock metrics #4795
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me
Additionally after releasing this we should probably remove |
Codecov Report
@@ Coverage Diff @@
## master #4795 +/- ##
=======================================
Coverage 54.68% 54.69%
=======================================
Files 414 414
Lines 53550 53560 +10
=======================================
+ Hits 29286 29292 +6
- Misses 21836 21843 +7
+ Partials 2428 2425 -3
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I added a separate check for max life transaction but happy to remove it. One thing that would definitely make this counter unnecessary is if majority of transactions coming from all sources had LastValid shorter than MaxTxnLife |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine. Is it possible to consolidate some of the metrics, since two sets of metrics are collected for the same conditions.
The code is getting crowded with metrics. At least it is important to add comments explaining the distinctions, so that if cleanup is needed, it can be done safely and easily.
data/pools/transactionPool.go
Outdated
case *ledgercore.TransactionInLedgerError: | ||
asmStats.CommittedCount++ | ||
stats.RemovedInvalidCount++ | ||
case transactions.TxnDeadError: | ||
asmStats.InvalidCount++ | ||
if proto.MaxTxnLife == uint64(err.LastValid-err.FirstValid) { | ||
asmStats.ExpiredMaxLifeCount++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be misleading, since we don't know if err.LastValid-err.FirstValid
is a small enough to make an impact, say less than 10, or if it is proto.MaxTxnLife-1
.
I think it will be better to check if it is less than 50, than compare to proto.MaxTxnLife
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is odd, I just didn't want to introduce an arbitrary magic number since any number is really arbitrary. Perhaps more useful would be an average transaction lifespan? Could implement it as running average by keeping running sum of lifetimes of expired txns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running average is too complex, and I don't suggest that. Also, it is not very helpful.
The "magic" number is the one that makes sense for the collected metrics.
I think, we are interested in learning if the txn life is 5 or 900. But we don't care if it is 900 or 990. I think anything more that 20 will fall into a long-term bucket. So, 20 is not a magic number, but helps characterize a certain behavior of transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the change to 20. We can see what stats this comes out to and further tune when removing old stats.
57e08ac
ledger/ledgercore/error.go
Outdated
@@ -55,7 +55,7 @@ func MakeLeaseInLedgerError(txid transactions.Txid, lease Txlease) *LeaseInLedge | |||
func (lile *LeaseInLedgerError) Error() string { | |||
// format the lease as address. | |||
addr := basics.Address(lile.lease.Lease) | |||
return fmt.Sprintf("transaction %v using an overlapping lease %s", lile.txid, addr.String()) | |||
return fmt.Sprintf("transaction %v using an overlapping lease (%s, %s)", lile.txid, lile.lease.Sender.String(), addr.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very confusing :) maybe rename addr to leaseData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? the "lease" in actual string to "(sender, lease)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but addr := basics.Address(lile.lease.Lease)
this is not address but the lease value. I'm referencing to addr.String()
that prints lease data, not addr as one can think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, I think
Summary
This splits out reasons for removing transactions from the pool during
AssembleBlock
with more granularity. The main reason is to provide visibility into the number of expired transactions across the network.Questions for discussion
MinTxnFeeError
. We could remove them, leave them or potentially downsample them if we think that having some logs would be usefulTxnDeadError
s that have expired and had theLastValid
set toFirstValid + MaxTxnLife
. This can be implemented by adding a bool field indicating this to theTxnDeadError
struct but accessing theMaxTxnLife
would require passing down theConsensusParams
all the way down fromrecomputeBlockEvaluator
and it might be messy if we don't think it's worth itTest Plan
Existing tests should pass