Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catchpoints: more support for EnableOnlineAccountCatchpoints #6214

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

cce
Copy link
Contributor

@cce cce commented Jan 3, 2025

Summary

Follow-on to #6177.

When writing catchpoint files, the catchpointFileWriter currently does not have any access to the consensus parameters, and so does not know if EnableOnlineAccountCatchpoints is set. This means catchpoints files may contain chunks with OnlineAccountRecordV6 and OnlineRoundParamsRecordV6 even when EnableOnlineAccountCatchpoints is not set. However these objects are ignored when calculating the label — the catchpoint label hash calculation is conditioned on EnableOnlineAccountCatchpoints.

This adds an argument to makeCatchpointFileWriter so that catchpointFileWriter knows what the current consensus version is.

This also adds support to catchpointdump for analyzing and dumping the onlineaccount and onlineroundparams records in catchpoint files, and calculating the labels.

This also addresses a corner case when the state proof recoverability system (from #4803) tells the onlineaccounts tracker to retain more than 320 rounds of history (set by votersTracker.lowestRound()) and used here:

    maxOnlineLookback := basics.Round(ao.maxBalLookback())
    dcc.onlineAccountsForgetBefore = (dcc.newBase() + 1).SubSaturate(maxOnlineLookback)
    if dcc.lowestRound > 0 && dcc.lowestRound < dcc.onlineAccountsForgetBefore {
        // extend history as needed
        dcc.onlineAccountsForgetBefore = dcc.lowestRound
    }

In this case, catchpoint files will contain more than the expected 320 rows, and lead to catchpoint label hash mismatch if catchpoint-generating nodes have differing opinions of when the last state proof was verified. In practice, this can really only occur when a node is catching up quickly (after being stopped and restarted, or starting from 0) and flushing large batches of rounds — it might not have verified the most recent state proof when it hits the catchpoint first stage snapshot round.

Test Plan

Updated TestExactAccountChunk to exercise the new consensus params argument, can try updating other similar tests.

Needs a new test where dcc.lowestRound is set to something older than (dcc.newBase()+1).SubSaturate(320), and verifies the excludeBefore argument works.

Copy link

codecov bot commented Jan 3, 2025

Codecov Report

Attention: Patch coverage is 17.94872% with 128 lines in your changes missing coverage. Please review.

Project coverage is 51.87%. Comparing base (a1137a2) to head (0097856).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
cmd/catchpointdump/file.go 0.00% 56 Missing ⚠️
ledger/store/trackerdb/sqlitedriver/kvsIter.go 0.00% 36 Missing ⚠️
cmd/catchpointdump/database.go 0.00% 6 Missing ⚠️
cmd/catchpointdump/net.go 0.00% 6 Missing ⚠️
ledger/store/trackerdb/sqlitedriver/accountsV2.go 0.00% 6 Missing ⚠️
ledger/catchpointfilewriter.go 73.33% 2 Missing and 2 partials ⚠️
ledger/catchpointtracker.go 80.00% 3 Missing and 1 partial ⚠️
...edger/store/trackerdb/sqlitedriver/sqlitedriver.go 0.00% 4 Missing ⚠️
ledger/catchupaccessor.go 33.33% 2 Missing ⚠️
ledger/store/trackerdb/dualdriver/dualdriver.go 0.00% 2 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6214      +/-   ##
==========================================
- Coverage   51.91%   51.87%   -0.04%     
==========================================
  Files         643      643              
  Lines       86234    86349     +115     
==========================================
+ Hits        44769    44797      +28     
- Misses      38599    38689      +90     
+ Partials     2866     2863       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cce cce requested a review from algorandskiy January 3, 2025 06:39
@cce cce added the Bug-Fix label Jan 3, 2025
@gmalouf gmalouf requested review from jannotti and gmalouf January 3, 2025 13:38
// pass dbRound+1-maxBalLookback as the onlineExcludeBefore parameter: since we can't be sure whether
// there are more than 320 rounds of history in the online accounts tables, this ensures the catchpoint
// will only contain the most recent 320 rounds.
onlineExcludeBefore := (dbRound + 1).SubSaturate(basics.Round(config.Consensus[blockProto].MaxBalLookback))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I'd probably have this name match the parameter one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

balanceHash, spverHash, onlineAccountsHash, onlineRoundParamsHash)

fmt.Printf("Catchpoint label: %s\n", fileHeader.Catchpoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it intended to persist or just some debugging remains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to print out the catchpoint label that's in the header when running catchpointdump — when you have a file you don't know it otherwise

@@ -560,3 +581,69 @@ func printKeyValueStore(databaseName string, stagingTables bool, outFile *os.Fil
return nil
})
}

func printOnlineAccounts(databaseName string, stagingTables bool, outFile *os.File) error {
fmt.Printf("\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stdout?

Suggested change
fmt.Printf("\n")
fmt.Fprint(fileWriter, "\n")

Copy link
Contributor Author

@cce cce Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from printKeyValueStore just above this, which also has a fmt.Printf("\n") ... but I guess it's pointless since the purpose is to append to the catchpoint fileWriter text file. Removed it

}

func printOnlineRoundParams(databaseName string, stagingTables bool, outFile *os.File) error {
fmt.Printf("\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

table := "onlineaccounts"
if useStaging {
table = "catchpointonlineaccounts"
}

var onClose func()
if excludeBefore != 0 {
// cheat: use Rdb to make a temporary table that we will delete later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to explain in greater details why do you create a temp table. Maybe copying part of this PR description, imo such kind of doc must live in the codebase

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 a brief description of what's going on here would be great

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a lengthy comment that explains the issue, and also links back to this PR.

@@ -213,7 +213,7 @@ func (ct *catchpointTracker) getSPVerificationData() (encodedData []byte, spVeri
return encodedData, spVerificationHash, nil
}

func (ct *catchpointTracker) finishFirstStage(ctx context.Context, dbRound basics.Round, blockProto protocol.ConsensusVersion, updatingBalancesDuration time.Duration) error {
func (ct *catchpointTracker) finishFirstStage(ctx context.Context, dbRound basics.Round, onlineAccountsForgetBefore basics.Round, blockProto protocol.ConsensusVersion, updatingBalancesDuration time.Duration) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found where onlineAccountsForgetBefore gets set in prepareCommitInternal; makes sense.

proto := protocol.ConsensusV33
testExactAccountChunk(t, proto, 1)
})
t.Run("v34", func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this v6 to v7 - sanity checking why testing v33 and v34

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh good point, I meant v39 and v40, fixing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, updated, and fixing it found an issue, I got confused with EnableOnlineAccountCatchpoints which was v34

Copy link
Contributor

@gmalouf gmalouf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared some questions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants