Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: reenable TestVotersReloadFromDiskPassRecoveryPeriod #5496

Merged
merged 5 commits into from
Jul 5, 2023

Conversation

algorandskiy
Copy link
Contributor

@algorandskiy algorandskiy commented Jun 23, 2023

Summary

Fix onlineAcctsExpiredByRound and allow it to lookup DB for round params data (proto + totals) similarly to how it works for online accounts: this info is available in DB for cases when trackers DB advanced but voters preserve longer online accounts history.

Reenable TestVotersReloadFromDiskPassRecoveryPeriod test after this fix.

Test Plan

Locally the test appears to be working fine.
Repro:

go test ./ledger -race -c  -o tt
./tt    -test.run TestVotersReloadFromDiskPassRecoveryPeriod   test.count=30 & ./tt    -test.run TestVotersReloadFromDiskPassRecoveryPeriod   test.count=30 & ./tt    -test.run TestVotersReloadFromDiskPassRecoveryPeriod   test.count=30 & ./tt    -test.run TestVotersReloadFromDiskPassRecoveryPeriod   test.count=30 & ./tt    -test.run TestVotersReloadFromDiskPassRecoveryPeriod   test.count=30 & ./tt    -test.run TestVotersReloadFromDiskPassRecoveryPeriod   test.count=30 & ./tt    -test.run TestVotersReloadFromDiskPassRecoveryPeriod   test.count=30 & ./tt    -test.run TestVotersReloadFromDiskPassRecoveryPeriod   test.count=30 & ./tt    -test.run TestVotersReloadFromDiskPassRecoveryPeriod   test.count=30 &

ahangsu
ahangsu previously approved these changes Jun 23, 2023
@codecov
Copy link

codecov bot commented Jun 23, 2023

Codecov Report

Merging #5496 (9747cbe) into master (c547097) will increase coverage by 0.04%.
The diff coverage is 65.21%.

@@            Coverage Diff             @@
##           master    #5496      +/-   ##
==========================================
+ Coverage   55.66%   55.70%   +0.04%     
==========================================
  Files         446      446              
  Lines       63128    63144      +16     
==========================================
+ Hits        35138    35174      +36     
+ Misses      25612    25603       -9     
+ Partials     2378     2367      -11     
Impacted Files Coverage Δ
ledger/store/trackerdb/sqlitedriver/sql.go 8.81% <0.00%> (-0.05%) ⬇️
ledger/acctonline.go 79.55% <78.94%> (+0.16%) ⬆️

... and 9 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@algonautshant
Copy link
Contributor

Failed against the current master : c223dbc

            	Error Trace:	/Users/shantkarakashian/go/src/github.com/algorand/4/go-algorand/ledger/ledger_test.go:2790
            	            				/Users/shantkarakashian/go/src/github.com/algorand/4/go-algorand/ledger/ledger_test.go:2981
            	            				/Users/shantkarakashian/go/src/github.com/algorand/4/go-algorand/ledger/withAndWithoutCache.go:33
            	Error:      	Received unexpected error:
            	            	round 752 before dbRound 3014
            	Test:       	TestVotersReloadFromDiskPassRecoveryPeriod/test_without_lru_cache

Copy link
Contributor

@algonautshant algonautshant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still failing against the master. Are the other fixes already in the master?

@algorandskiy
Copy link
Contributor Author

This is error tells voters failed to build a tree at round 752 = 256 * 3 - 16 because trackers advanced. This should not happen and appears to be a real bug. Good catch.

@algorandskiy
Copy link
Contributor Author

The test is using in-memory sqlite

	const inMem = true
	l, err := OpenLedger(log, dbName, inMem, genesisInitState, cfg)

and the issue most likely caused by db access congestion

        testingLogger.go:38: time="2023-06-26T10:16:31.402883 -0400" level=warning msg="db.atomic: 15 retries (last err: database table is locked: blocks)" callee="github.com/algorand/go-algorand/ledger.(*blockQueue).syncer.func2" caller="/go-algorand/ledger/blockqueue.go:169" file=dbutil.go function="github.com/algorand/go-algorand/util/db.(*Accessor).AtomicContext" line=277 readonly=false

Kind of weird, voters should prevent trackers from advancing if lagging behind.

@ahangsu ahangsu self-requested a review June 26, 2023 15:43
@algorandskiy
Copy link
Contributor Author

Narrowed the issue down to onlineAcctsExpiredByRound and provided a fix.

runtime/debug.Stack()
	/usr/local/go/src/runtime/debug/stack.go:24 +0x72
github.com/algorand/go-algorand/logging.logger.Errorf({0xc0000fa1c0?, 0xc00012a0a8?}, {0x10171301a, 0x36}, {0xc000247300, 0x4, 0x4})
	/go-algorand/logging/log.go:231 +0x7e
github.com/algorand/go-algorand/ledger.(*onlineAccounts).roundParamsOffset(0xc0007c6590, 0x4f0)
	/go-algorand/ledger/acctonline.go:657 +0x324
github.com/algorand/go-algorand/ledger.(*onlineAccounts).onlineAcctsExpiredByRound(0xc0007c6590, 0x4f0, 0x600)
	/go-algorand/ledger/acctonline.go:1028 +0x28a
github.com/algorand/go-algorand/ledger.(*onlineAccounts).ExpiredOnlineCirculation(0xc0007c6590?, 0xc00e58eea0?, 0x8?)
	/go-algorand/ledger/acctonline.go:1098 +0x5c
github.com/algorand/go-algorand/ledger.(*onlineAccounts).TopOnlineAccounts(0xc0007c6590, 0x4f0, 0x600, 0x400, 0xc00a0423a8, 0x19?)
	/go-algorand/ledger/acctonline.go:974 +0xf19
github.com/algorand/go-algorand/ledger/ledgercore.(*VotersForRound).LoadTree(_, {_, _}, {{}, 0x4f0, {0x1b, 0x7d, 0x28, 0xe, 0xab, ...}, ...})
	/go-algorand/ledger/ledgercore/votersForRound.go:129 +0x11d 

@algorandskiy
Copy link
Contributor Author

algorandskiy commented Jun 28, 2023

@icorderi new method in accts reader AccountsOnlineRoundParamsRound
LookupOnlineRoundParams added instead of LookupOnlineTotalsHistory

}
var roundOffsetError *RoundOffsetError
if !errors.As(err, &roundOffsetError) {
return ledgercore.OnlineRoundParamsData{}, err
Copy link
Contributor

@cce cce Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The onlineTotalsEx version of this logs the error but doesn't return, then falls through to doing LookupOnlineRoundParams.. slightly inconsistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I noticed that as well, if I remember right, it was some temporal thing while we do not have stateproof recoverability. I guess it is safe to remove, but I'll double check

@@ -569,8 +589,12 @@ func (ao *onlineAccounts) onlineTotalsEx(rnd basics.Round) (basics.MicroAlgos, e
ao.log.Errorf("onlineTotals error: %v", err)
}

totalsOnline, err = ao.accountsq.LookupOnlineTotalsHistory(rnd)
return totalsOnline, err
roundParams, err := ao.accountsq.LookupOnlineRoundParams(rnd)
Copy link
Contributor

@cce cce Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an argument that onlineTotalsEx can just be a wrapper around roundParamsEx now and not have this RoundOffsetError-checking logic implemented twice, but I don't feel strongly about it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes you wonder, do we even need the onlineTotalsEx() method at all — it is only called from TopOnlineAccounts.

Copy link
Contributor

@cce cce Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the difference is onlineTotalsEx() calls onlineTotals() which acquires the accountsMu.RLock itself, but roundParamsEx does not. But TopOnlineAccounts could acquire the RLock too or pass a synchronized=false parameter when calling roundParamsEx or something.

Copy link
Contributor

@cce cce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some potential to deduplicate a little bit of code but overall looks good and so glad you found this and caught it!

@@ -556,6 +556,26 @@ func (ao *onlineAccounts) onlineCirculation(rnd basics.Round, voteRnd basics.Rou
return totalStake, nil
}

// roundsParamsEx return the round params for the given round for extended rounds range
// by looking into DB as needed
// locking semantics: requires accountsMu.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is ambiguous: does it lock accountsMu or requires accountsMu to be held before calling this function?

ledger/acctonline.go Show resolved Hide resolved
Copy link
Contributor

@algonautshant algonautshant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants