-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Algod: support state proofs recoverability #4803
Conversation
* [WIP] Migrate stateproof database to store builders and add PersistentBuilder type * [WIP] Refactor builder and remove PersistentBuilder and dependency on VotersForRound * [WIP] Add persistBuilders flag to stateproof.Worker, and delete old builders from database * [WIP] . * Finalize the builder struct, fix deleteOldBuilders and make msgpack * Finalize implementation and fix tests * Add and update unit tests to verify database is consistent with builders in memory * rename stateproof top voters alloc bound name * fx: removed comment * refactor: fetchBuilder to treat errors from decoding. * using sql.ErrNoRows * rm: unused fields in stateproof/Worker * fx: avoiding sql access for sigs in the future * refactor: moving persisted fields into BuilderPersistingFields struct * fx: stating that the worker is not a persistentBuilder in numerous unit-tests * inspect sig is in db prior to signing in func signStateProof * improved exists claus * reviewdog-fix * refactor: getBuilder func to not fill a ptr * fx: fetchBuilderForRound behaviour * fix: shutdown to worker after db-init in test * fx: warning instead of error log * rm: error var * fx: separate SP DB schema upgrade into two functions * adding unit test for db schema upgrade * rm: close db inside makeStateProofDB * improved unit-test * renamed b as var name * fx: comment * fix: unneeded if statement in the sigExistsInDB * fix: small bug in sigExistsInDv verify * adding unit-test to ensure fetching from disk is performed * adding unit-tests * adding unit-test that catchs an issue * fx: when fetching builder from DB - fill with sigs * fx: fetchBuilderFromDB refactor * fx:race * reusing code * fx: race * fx: watch-dog issue * fx: comment * rename insertBuilder func * improvement: reloading sigs-per-round * removing unneeded field from func fillBuilder * fix CR * fixing edge cases with when no signatures are in the database * support old databases having only signatures * fix bug + work on tests * adjusting tests * close the query * verify that builder can be loaded with no signatures * remove the const flag for now * fmt Co-authored-by: Jonathan Weiss <jonathan.weiss@algorand.com> Co-authored-by: algoidan <idan@algorand.com>
* added state proof verification tracker skeleton * now saving data regarding state proofs on new block * added test for state proof verification tracker * Removed comments from interface functions * added sp verification tracker to the ledger * added license, returning the appropriate value from produceCommittingTask * now handling 0 in StateProofInterval * tracker test now test removal from tracker * removal now works * added db version and created sp verification table * moved verification data type to ledgercode and generated encode functions * added state proof round to verification data for db indexing * added code to prune old state proofs * added code to retrieve verification data from the tracker * commented out some code, now also saving generated round in verification data * now testing addition and committing * import order fix * now using deltas to decide whether a state proof has occurred in a block * basic addition and removal work and are tested * some test refactoring * added test for chain in happy flow * renamed and refactored a lot of code * fixed emergent index usage bug * added test with disabled state proofs * changed variable usage for more readable code * changed error reporting in tracker lookup * changed lookup, added more commit tests * increased confidence in produceCommittingTask * added db pruning tests * reordered tests, finished pruning tests * handled initialization slice size * fixed v6 migration test to not impact global state * added state proof verification table creation to TestLedgerReloadTxTailHistoryAccess * added missing table in ledger test * all tests except interval change work * fixed bug in receiving deletion data, added test for interval change * changed tracker and refactored tests to not touch the dcr * generated round no longer saved in db * renamed delete data, split newBlock to functions * reordered functions, now locking to protect internal data structures * some lock reordering * added necessary comment for compilation * added comments where relevant * closed todos in ledger * some renaming * fixed bug in deletion index check * added tracker ledger test * removed todo that is now a ticket * wrapped verification data insert with db.retry * added nil check on close * updated dump_genesis.sh * added lookup tests * removed redundant passing of slice by reference * CR: now testing data len to return * CR: Renamed ProvenWeight to OnlineTotalWeight * CR: separated imports * CR: avoiding looking into future protocols in loadfromdisk * CR: Now breaking to avoid redundant comparisons in an already sorted array * CR: Removed redundant optimization from test. * CR: added checks to verify that new blocks indeed arrive in order * now panicking after logging invalid new block * CR: No longer passing entire data structure where round would suffice * removed usage of proto in loadfromdisk * CR: now wrapping errors in lookup * some lookup refactor * improved error message * missing parantheses * error check fix * CR: some refactors around verification data lookup * CR: removed accountdb prefix from creation function * CR: error refactoring * ledger test now asserts more specific situations * CR: added unit test for panic * CR: moved locking for performance improvement * CR: added load from disk test * CR: moved reload test to ledger_tests * CR: Removed redundant error check
* fix: stateprf deletes keys after stateproof advances. * fixed unit test to match the current code of key deletion * fix: all unit tests pass, added TODO * fx: comment * fx: deletion of previous rounds * fx: unit-test * fx: prevStateproof init value * fx: golangcli * fix merge issues. * refactoring * add test * fix deletion bug * fix tests * refactoring tests * reuse code on tests * fix keys removal to use the next state proof * builder will not remove signatures and builder to support recoverability * refactoring + avoid remove if stateproof was not changed * remove double test * checking that the deletion round is not higher than expected * minor refactor * we now remove the last ephemeral key * nits * add test for deleting all keys in account manger * move warning message to error Co-authored-by: algoidan <idan@algorand.com>
Merge recoverability master
* add version to the verification structure and new consensus param * refactor state proof verifcaion * apply state proof using trakcer * export stateproof verification data in LedgerForCowBase interface * using tracker on tests. * fix ledger test * fix borken interface * choose to use the tracker based on atRound proto version * fix tests * testing v34 * refactoring * fix comments
* adding one memory slot for latest sp-verification-data * refactor: changed if-else to one if clause * loadFromDisk should fill lastSeenStateProofVerificationData from the disk * added unit tests * fix: caching latt lookup for future uses * sql-fx * fix: returning copy of the lastLookedUpVerificationData * fix: checks cache location moved removed unneeded sql lookup on reload from disk fixed comments fix: zero-cache is ignored * removed comment from function
Fix for issue from nightly tests where tryBroadcast is holding the lock and preventing invokeBuilder from running.
85fc808
to
8bf459b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I cannot find any actionable problems at this time.
Some pending tasks will be addressed in future PRs.
The complexity of this feature remains high, despite many simplifications.
These are mainly due to architectural interdependencies, such as the ldeger/ledgercore dependency, but this is out of the scope of this work.
@algoidan please remerge master. Someone just merged internal -> eval renaming :( |
…aster Merge sp recoverability from master
Nodes aggregate and save state proof information to be able to recover a stalled state proof subchain. In case of a problem nodes will not lose old state proof data so it will be possible to recover the stalled state proof chain. Data being saved: * state proof builder state is persisted * a new state proof verification tracker also persists some state proof related data from block headers Co-authored-by: Or Aharonee <17099688+Aharonee@users.noreply.github.com> Co-authored-by: Almog Tal <107349997+almog-t@users.noreply.github.com> Co-authored-by: Jonathan Weiss <85506383+algonathan@users.noreply.github.com> Co-authored-by: Pavel Zbitskiy <65323360+algorandskiy@users.noreply.github.com>
Summary
The goal of this PR is to change the way nodes aggregate state proof information. In case of a problem with the state proof chain, nodes will not lose old state proof data so it will be possible to recover the stalled state proof chain.
High-level changes:
Test Plan