-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add comment about telemetry #820
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
algobolson
approved these changes
Feb 12, 2020
btoll
pushed a commit
to btoll/go-algorand
that referenced
this pull request
Mar 3, 2020
btoll
pushed a commit
to btoll/go-algorand
that referenced
this pull request
Mar 6, 2020
tsachiherman
added a commit
that referenced
this pull request
Mar 17, 2020
* shellchecked `build_deb.sh` * Test pre-packaged executable on variety of linux platforms (#651) * Add platform testing using docker for generated binaries. * Fix path. * Apply reviewer's requested changes. * Reduce e2e_go_tests execution time twice (#645) There are seven major contributors to integration tests running time TestOnlineOfflineRewards (1248.64s) TestAssetConfig (364.71s) TestRewardRateRecalculation (226.78s) TestStartAndEndAuctionTenUsersOneBidEach (196.34s) TestNoDepositAssociatedWithBid (189.74s) TestDeadbeatBid (188.70s) TestStartAndCancelAuctionNoBids (183.35s) This commit considers only first three. 1. Fixing rewards interval in config for TestRewardRateRecalculation from 25 to 10 reduces time twice: TestRewardRateRecalculation (119.34s) 2. Fixing initialRound in TestOnlineOfflineRewards test from 301 to 11 reduces time 15 times: TestOnlineOfflineRewards (73.80s) 3. TestAssetConfig looks long by design - commits and waits max allowed assets 4. Address TODO in run_integration_tests.sh. Now e2e_client_runner calls 'goal network delete' to reflect this removal Refers #508 * Promote test_release.sh so that it won't conflict with release testing. (#655) * Fix concurrent access to wallet handles cache in goal (#654) * Fix concurrent access to wallet handles cache in goal * In rare cases (i.e. e2e tests run in parallel on the same network) a race cond happens when accessing goal.cache/walletHandles.json file * Introduce advisory locking on the mentioned file * Implementation is extendable by implementing *locker* interface for specific platform and providing a new *newLockedFile* constructor. * Address PR review notes * Do no truncate before obtaining the lock * Increase waiting interval to 10 ms * Simplify newLockedFile constructor * Allow upgrades to specify the delay before their execution. (#650) This replaces UpgradeWaitRounds with MinUpgradeWaitRounds and MaxUpgradeWaitRounds. Proposers specify an upgrade's delay given their own ApprovedUpgrades, encoding the proposed delay in the UpgradeVote. Verifiers check that the delay sits between MinUpgradeWaitRounds and MaxUpgradeWaitRounds (inclusive). This commit adds this functionality but does not change current behavior. * Set explicit 30 sec timeout for AlgorandGoal::RawSend in expect test (#658) * Should help with sporadic failures when we send and TEAL in groups * Support variable-delay protocol upgrades in ConsensusFuture. (#659) Also add some unit tests for variable-delay protocol upgrades. * Shant/catchup stop on unapproved (#660) * A fix for arm64 failures One observation from the failures is that the test timeouts could be the cause of the failure. Expect scripts when called from go test using CombinedOutput is behaving strange (slow). Replacing CombinedOutput with Run. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. Fixing errors and adding comments. * Fixing merge and comment. * added comment * Stop catchup on unapproved protocol round Catchup to stop before fetching the next round if the round protocol is not approved by the node * Some fixex. Review comments from Tsachi. * File accidentally added here. removing. * Reverting changes mistakenly added to this branch. * Adding comment changes. * Partially working test * Adding test to catchup stop on unsupported block Using s.cancel we are droppng the last block. * More tests and development to the catchup service * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. Addressing Tsachi's review comments. * Combine condition blocks * Fixing an error in the log info statement. * Compile linux/amd64 binaries with static linking (#625) * Test static compilation. * remove -fPIC * Try with ubuntu 18.04, since it has newer GCC. * exclude buildmode from test builds. * Fixed missed buildmode. * Refactor. * Add logging for the telemetry server connections (#661) * Add logging for the telemetry server connections. * Revert unintended change. * Improve error message. * add bool support to algocfg (#667) e.g. `algocfg set -p EnableProcessBlockStats -v true` * Reduce execution time of expect tests (#665) * CombinedOutput blocks on copying empty stderr stream from expect that causes at least 60 sec timeout for most of the tests * This implementation uses a temp time for stderr accumulation. In this case exec.Cmd does not run goroutines for reading child's actual stderr. * 655 sec (before) vs 205 sec (after) * Avoid upgrading boost on travis Mac builds (#669) * specify a boost version for the mac build. * try to prevent boost update on travis mac builds. * Abort algod startup if logging.config file has bad permissions (#662) * This should prevent telemetry event loses on systems with invalid permissions on ~/.algorand/logging.config file * Another possible workaround is to relax default config path mask in **cmd/goal/commands.go:ensureCacheDir** from 700 to 744. This is not implemented because of possible security risk. * Add error logging for getting a cached wallet handle (#663) Needed to debug 'Couldn't read password: inappropriate ioctl for device' error message in tests * Update license date 2019 -> 2020 (#674) * Change 2019 -> 2020 * Update readme. * Update copyright to use date range. (#676) * Tee existing tests so we can review output before piping it forward. (#677) * Make gracefull exit of a node that is waiting for WaitForBlock call (#679) * Make gracefull exit of a node that is waiting for WaitForBlock call. * Add comment. * Remove tput where not supported by terminal (#682) * Remove tput where not supported by terminal. * send tput errors to dev/null * Fix bad constants. * Avoid waiting for block that won't be reached due to unsupported protocol upgrade. (#681) * Fix - Indexer now shows received transactions (#684) -- Adding receiver function to transaction that returns the receiver of a transaction -- Fix indexer to show received transactions * Undo teeing to dev/tty as it doesn't work well in terminal free environments. (#689) * Improve lockFile error handling (#687) * Better lockFile error handling. * Make blocking locker. * Fix F_OFD_GETLK constant. * bugfix. * Try platform specific code. * use unix package to include F_OFD_SETLKW * remove unused imports. * Rename files. * Catchup service stop on unsupported and e2e test (#685) * A fix for arm64 failures One observation from the failures is that the test timeouts could be the cause of the failure. Expect scripts when called from go test using CombinedOutput is behaving strange (slow). Replacing CombinedOutput with Run. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. Fixing errors and adding comments. * Fixing merge and comment. * added comment * Stop catchup on unapproved protocol round Catchup to stop before fetching the next round if the round protocol is not approved by the node * Some fixex. Review comments from Tsachi. * File accidentally added here. removing. * Reverting changes mistakenly added to this branch. * Adding comment changes. * Partially working test * Adding test to catchup stop on unsupported block Using s.cancel we are droppng the last block. * More tests and development to the catchup service * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. Addressing Tsachi's review comments. * Combine condition blocks * Fixing an error in the log info statement. * Draft: Test for upgrading a node while keeping another node not upgradable goal node status field for informing if the node is upgradable * Catchup service stop on unsupported, ode status message, and e2e test In this change: Updated catchup service to stop on unsupported and not unupgradable. Updated goal node status to inform when the catchup service is stopped. Updated goal node status by removing last synced information. Added e2e test for stopped catchup service on unsupported protocol. * Separating goal changes from this PR. Separating goal changes from this PR. goal changes are in PR: https://github.com/algorand/go-algorand/pull/686 * review comment: use NotEqual instead of True * Make ARM64 build mandatory. (#694) * Updates to the goal node status (#686) * Updates to the goal node status This change is splitting the goal section from PR: https://github.com/algorand/go-algorand/pull/685 Updated goal node status to inform when the catchup service is stopped. Updated goal node status by removing "Synced Since Startup" field. * Adding parameter StoppedAtUnsupportedRound to v1.NodeStatus and node.StatusReport * Adding check to libgoal Client StoppedAtUnsupportedRound in v1.NodeStatus true and false values. * Review comments from Tsachi: using the timeout in select * Updating the test to reflect the removal of: has synced since startup. * telemetry recorded locally as info log (#666) config.json: {"TelemetryToLog":true} logging.config: {"Enable":false,"SendToLog":true} * Relax StartNetwork regex (#696) * relax StartNetwork regex. * Another attempt. * Two fixes to basicCatchup_test: cloned node not stopped and env var conflict (#697) * Updates to the goal node status This change is splitting the goal section from PR: https://github.com/algorand/go-algorand/pull/685 Updated goal node status to inform when the catchup service is stopped. Updated goal node status by removing "Synced Since Startup" field. * Adding parameter StoppedAtUnsupportedRound to v1.NodeStatus and node.StatusReport * Adding check to libgoal Client StoppedAtUnsupportedRound in v1.NodeStatus true and false values. * Review comments from Tsachi: using the timeout in select * Two fixes to basicCatchup_test: cloned node not terminated and env var collision 1) TestBasicCatchup and newly added TestStoppedCatchupOnUnsupported create a new node by cloning one of the network nodes. When fixture.Shutdown() stops the original network nodes, leaves the cloned node running. This change adds function shutDownClonedNode to stop the cloned nodes. 2) In TestStoppedCatchupOnUnsupported, an env variable is used to delete ConsensusCurrentVersion, so that the cloned node behaves as if its binary does not support the consensus version. However, when the TestBasicCatchup runs in parallel, it also picks up the env variable, and consequently deletes ConsensusCurrentVersion from the Consensus map. When this happens, TestBasicCatchup sporadically fails. In this change, instead of having ConsensusTestUnupgradedProtocol upgrade to ConsensusCurrentVersion, or deleting ConsensusCurrentVersion so it cannot be upgraded, it sets up ConsensusTestUnupgradedProtocol to upgrade to ConsensusTestUnupgradedToProtocol. Hence, the env variable is used to delete ConsensusTestUnupgradedToProtocol. This way the conflict with other tests is eliminated. * Fixing golint by addint comment. * Tsachi's review comment: unsetting the env var. * Make scripts executable. (#702) * More reliable fetcher unit tests. (#708) * Avoid starting the Telemetry service when logging is disabled (#703) if remote telemetry is not enabled, do not start uri update service add a nil check * Shutdown kmd when test fixture is going down. (#709) * Fix unit test. (#711) * Execute e2e tests one at a time on arm64 (#701) * Test changes. * Better error reporting on goalFixture * Add version query for kmd startup. * Few more test cases to cover. * try to wait. * changes * Update. * Move KMD shutdown to network. * Add some debug messages to figure out what's going on. * Fix script bug. * Fix proper KMD shutdown via the KMDFixture * Run the tests one at a time only on arm64 * Updating according to review. * Disable pprof endpoints by default (#693) * enable go profiler for netdeploy * add EnableProfiler to ConfigJSONOverride * Update the makefile to skip the static linking when compiling on centos. (#713) * Fail e2e-go tests when node panics (#699) * Fail test on panic * few more touchups. * sync * bugfix. * Update few more usecases. * Refactoring * Simplify. * undo network referencing. * undo few func-ptr. * undo some more stuff. * Update method names * Few more touchups. * Build release job (#698) * Initial commit * Added Jenkinsfile * Updated Jenkinsfile * Works until GPG IPC * Move build files into new release/ dir Also, renamed files {build_,}release.sh and {build_,}setup.sh * Path issues * Use t2.xlarge instance type (4 vCPUs, 16GB ram) * Restructuring * shellchecked * fix bug * Added new `socket.sh` file * Trying to build rpm * Bump up disk size of ec2 instance * more attempts to make rpm * more fixes * move /stuff -> /root/stuff * wip * moved to correct paths * Have `release` have its own start and kill ec2 instance scripts * use buildhost scripts after all * Make sure the gpg key name matches!!!!! -%_gpg_name Algorand RPM <rpm@algorand.com> +%_gpg_name rpm algorand <rpm@algorand.com> * fixes * Add upload stage to pipeline * Add tag stage to pipeline * more fixes * Move start/stop ec2 instance scripts back into release/ * Add ability to dynamically set branch * Added controller/ subdir * Some cleanup * Adding tag support Moved `Jenkinsfile` into controller/ subdir. * Move build_env build.sh -> setup.sh Moved socket.sh -> controller/socket.sh * Revert buildhost changes * some cleanup * fix build * test packages locally * upload packages to s3 test bucket * restructure * misc * fix build * Add Jenkins parameters * fix build * Move commands into Jenkinsfile into stages/ * fix build * Make test stage more explicit * fix build * Implementing reviewer suggestions * Added debug info * fix build * Merge into master * implement reviewer suggestions * turn off test stage * fix build * fix build * fix build * Update readme * removed unneeded archive/ dir * Use service-wide logger instead of logging.Base() in agreement (#714) * Switch from default logger to pre-configured logger in some components of agreement service * Mark some of the slow e2e tests as such (#719) * Mark some of the slow e2e tests as such. * Move shorttest flag to be set at top level. * Wait test less restrictive. (#718) * Move slow test to get executed on nightly builds (#721) * Move some more test to be "slow tests", and modify short test condition so that we will run the long tests on nightly builds only. * Fix elif -> else * Faster upgrade tests. (#722) * Disable failing test. (#724) * Generate docs for algokey. * s/goal/algokey * Improve algons error logging (#733) * Write body when erroring on SRV/DNS records update. * Few more error messages. * ledger/eval refactor (#700) refactor ledger/eval block validation don't do crypto+lsig validation in eval fix sync in backlog executer queue clean up lots of logging to make tests quieter * Fix a bug in Credential.lowestOutput caused by improper domain separation (#716) * Fix a bug in Credential.lowestOutput caused by improper domain separation The bug causes larger accounts to be block proposers more often than should happen based on their fraction of online stake. This patch will cause nodes to vote for a protocol upgrade that fixes the buggy behavior. After the protocol upgrade goes through, all the upgrade-related code in this commit should be removed, as it's not necessary to retain the old buggy behavior for catchup. (For convenience code to be removed is marked with a "TODO(upgrade)" comment.) * Typofix; fix merge issue * Fix test * Add a comment to make the linter happy * Typo fixes * Goal docs tweaks (#731) * test all `goal ... -h` (#730) * test all `goal ... -h` ensures no conflicting subcommand options adds less than 2 seconds to test time * review tweak, rearrange to sub test script * actually pass args * grr, arg * Move EnsureDigest logic into the catchup service (#726) * Move EnsureDigest logic into the catchup service. * update unit tests. * Add unit testing for new catchup feature. * updating per review. * Add handing for concurrently updated round. * Add comment. * typo * Correct the quit semantics. * Faster stringer implementation for Address (#736) * Faster stringer implementation. * Optimize UnmarshalChecksumAddress as well. * Add comment. * Interconnect relays on a locally deployed network (#742) * static codegen for msgpack encode/decode (#578) Implement static code generation for msgpack encoding and decoding of blocks and transactions. The existing functions `protocol.Encode` and `protocol.Decode` invoke the generated encoders and decoders if present. Benchmarking block encode/decode suggests this is about 4x faster than go-codec (which we were using previously). When changing existing data structures to be encoded, or adding new ones, run `make msgp`. Some code is still using go-codec (notably agreement). If we convert all code to use this static code generation plan, we could get rid of the dynamic check and dispatch in `protocol.Encode` and `protocol.Decode`. Having fast encoding/decoding is not only good for performance, but allows us to remove complex optimizations (like caching txid values or encoding lengths, removed in this commit), and might allow us to perform checks that we previously thought would be too expensive (like making sure that an encoding is canonical, by re-encoding). Having explicitly generated code also makes it easier to understand performance and tweak it further. Results from pprof should be much less opaque (no reflection) and more actionable. Explicit codegen also makes it clear when we make a change that affects encoding/decoding of network messages. The code generation is done using a modified version of github.com/tinylib/msgp, forked as github.com/algorand/msgp. * Use cobra for the kmd command to allow for documentation automation. * Limit client side connection rate, part 1 * Draft of the solution * saving current changes * saving current changes * saving current changes * saving current changes * saving current changes * saving current changes * saving current changes * Addressing review comments. * fixing test failure * fixing test failure2 * Adding a unit test * txsync now will go through http request connection limit. * Addressing review comments. Changing phonebookEntries duration type from uint to time.Duration * fixint test failure. * splitting wait for connection time and add connection time. Addressing some review comments. * recording provisional time before connect, updating after. * minor fixes * Embedding MockNetwork in mock structs which implment GossipNode to avoid the implmentation of dummy functions to satisfy the interface. * not embedding by reference. * A few more review comment fixes. * Fix checkdep message. (#745) * Fix equal stake distribution in generated networks (#749) * Use math.big.Rat rational numbers to get rid of summation error * Root cause although in JSON serialization of float64 data type so that some values are rounded and others are not. Correct fix seems to be in using the same accuracy in distribution code and float64 marshaling. * Update with PR feedback. * Change a player test to use either old buggy behavior or new correct behavior depending on ConsensusCurrentVersion. (#748) This allows agreement tests to pass whether ConsensusCurrentVersion is the old V20 or the new V21 * Bugfix: Fix last relevant proposal period in agreement protocol. (#746) When retrieving the last relevant period corresponding to a proposal-value, the proposal store inside the agreement protocol does not properly check that the particular period returned actually matches the passed-in proposal-value. Instead, the proposal store returns the last period seen for *any* proposal-value. When the agreement state machine receives a proposal payload, the proposal store checks whether this payload matches any proposal-value known to be relevant in the current round. If it does, the state machine tells the crypto verifier to verify the new payload. As an optimization, the proposal store in the state machine also tags the payload with the last period in which it is relevant (and whether the matching proposal-value is pinned). The crypto verifier halts concurrent verification of any payload from that period. Separately, the proposal store does not attempt to verify payloads more than once, caching past payloads it has pipelined. For this optimization to be correct, the last relevant period must be correct; otherwise, the network will permanently stall if the following occurs: - In period p, the network observes a best proposal value of v, but it sees neither the payload B corresponding to v nor a threshold of soft-votes for B (seeing such a threshold pins B, preventing the crypto verifier from cancelling). - An attacker is able to see B. - In period p+1, the network attempts to agree on a new proposal value v' corresponding to the payload B'. - After half of the network has received B' but has _not_ finished verifying it, the attacker sends this half the payload B. This half will cancel verification of B' (since it erroneously associates B with period p+1) and will permanently ignore any future broadcasts of B' (which was cached in the proposal store). - If the other half has already staged B', the network will stall permanently, since it will be unable to commit B'. Fixes #710. Thanks to @xixisese for reporting this bug. * Format numbers using number specifier (#735) * Use %d to print numbers, which is abit safer as it prevent potential recursion. * Few more changes to the fuzzer. * Two more updates. * Implement local net template generation with netgoal (#762) * Usage: netgoal generate -n 1 -R 1 -w 100 -o mynettemplate.json -r . -t goalnet goal network create -t mynettemplate.json -r mynet -n mynet * Remove duplicate definitions from netdeploy/networkTemplate * Improve net templates support (#766) * Fix file descriptors leak in 'goal account'. Now goal can import more than maxfiles keys * Fix uint overflow in stake distribution validation. Details: values 10 and -110 were casted to uint and sum up to 100 pct with 32 bits overflow * Allow pct fraction of stake in goal net templates * Fix stake distribution in netgoal.generate: it always produces pcts and not values in algos as was incorrectly thought before * Add tests for netdeploy.Validate() * Release build pipeline step 1: Build, package, sign, deploy to staging (#763) * Reorganize * more restructuring * cleanup * removing test bits * changing upload destination * remove test dir * remove cruft * Moved Jenkinsfile -> jenkinsfile/Build * replace {RSTAMP,FULLVERSION} * fix bugs * remove temp dir location * remove buildnumber.dat * Implement automation for release notes generator (#761) The cicd.yaml config file in this branch can be consumed by our cicd cli to create a draft for release notes for a given version. * back out locking added in c78ada09f230a3c66cd934860700f93ff31a93eb (#764) * back out locking added in c78ada09f230a3c66cd934860700f93ff31a93eb * remove IsFull * bring back txn liveness check. buffer up to all payset groups in chan * no chan close * Implement dummy telemetry hook to safely perform operations on it when telemetry is disabled (#768) * The idea is have telemetry.hook always set. For telemetry disabled case this is a simple noop stub. * Prevents crashes when calling hook.Close/Flush on private networks in case of errors * Remove instances of tagging in our build process (#770) We don't want to be making tags anywhere in our automation. Our release process will take care of that. * Configurable consensus protocol (#750) * Create consensus.json * some changes.. * remove deadcode. * update constant. * Update fixture. * migrate fast upgrade protocols. * move catchup test protocol. * push staged changes. * bugfix. * Remove last test consensus param. * rollback block.go * cleanup : map[protocol.ConsensusVersion]ConsensusParams -> ConsensusProtocols * udpate. * Fix unit test. * Release build pipeline step 2: Test (#773) * Reorganize * more restructuring * begin test stuff * restructure * fix deb test * fix rpm test * fix build * restructure * fix bug * remove temporary feature branch * added new gpg.sh * removed buildnumber.dat * When locally installing, take the binaries from the first-gopath-bin directory. (#776) * Remove temporary build test location (#777) * Make sure to default to Consensus if consensus.json is missing. (#779) * Make util.ExecAndCaptureOutput able to process large output (#771) * In case of large amount of data written to stdout/stderr from the wrapped command the process is blocked until stdout/stderr buffers cleared. * Old implementation waited until cmd return and then read stdout/stderr. * New implementation reads stdout/stderr pipes in goroutines. * Make goal node state change commands systemd aware (#769) * Make goal node state change commands systemd aware I added a property to libgoal/system.go where we can set whether or not our algod process is managed by systemd. * Write expect test for goal node with systemd scenarios This tests that the message from our cli on goal node start, stop and restarts is correct for systemd_managed data_dirs. * Write expect test for goal node start, stop and restart This tests that the message from our cli on goal node start, stop and restarts is correct for data_dirs that are not managed by systemd. * Add systemd_managed: true as a default in system.json Since all linux installs currently use systemd, I added this to the base system.json file. * Restructure release/ dir (#782) * Restructure release/ dir for each build release pipeline stage First step is the `build` pipeline. * More restructuring Removed `release/ci/`. Every dir under `release/` will now be a pipeline. * Added "test" pipeline * update readme * Remove temp location and remove code cruft * removed outdated readme * more cleanup * implement reviewer changes * Allow asset creation transactions to be created while catching up. (#790) * Tunnel outgoing connection via a rate limiting dialer (#780) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Addressing Pavel's comments. * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Addressing Pavel's comments. * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Allow asset creation transactions to be created while catching up. (#790) * Addressing Pavel's comments. * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Addressing Pavel's comments. * rebasing master Co-authored-by: Tsachi Herman <tsachi.herman@algorand.com> Co-authored-by: Will Winder <wwinder.unh@gmail.com> * Release build pipeline step 3: Added "prod" pipeline to `release/` (#788) * Release build pipeline step 3: Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Implement reviewer suggestion * better algons error messages. (#794) * Create a rate limiting transport (#795) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Rate limiting transport. * remove comment. * Unify dialing path. * Removing ForceAttemptHTTP2 which isn't available on go 1.12 Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Some release abstraction (#796) * Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Restructuring * test stage * add common build params * Remove temp github location * Change agreement message encoder to msgp. (#786) * Upgrade to new version of msgp. - omitemptyarray and omitempty are correctly distinguished between in equivocationVoteAuthenticator. - The embedded Block is correctly handled in proposal, unauthenticatedProposal, and transmittedPayload. * Randomize anonymous (embedded) fields when testing codec. Co-authored-by: Nickolai Zeldovich <nickolai@csail.mit.edu> * Move fetcher client into catchup (#774) * changes. * adding dialer. * Move fetcher client into catchup, step 1. ( most unit tests are still broken ) * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * update. * fix few more unit tests. * fix syncer tests. * undo change. * Add a comment. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Fix gpg keygrip code and remove old code (#797) * bugfix : compile correctly teal program that includes a base64 string which starts with double slash (#787) * update. * Improve test. * Add support for multiple network protocol versions (#799) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Add a version-accept header to support multiple network protocol versions. * update. * Remove comments. * Addresing reviewer concerns. * Add a unit test for checkProtocolVersionMatch logic. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Include comment about something that looks like a vulnerability, but isn't. (#820) * Skip logging and telemetry when not needed. (#737) * Added utils for testing release packages (#819) * Added utils for testing release packages check_sig: Verify gpg signatures of build artifacts. test_package: Verifies the packages were built from the correct branch with the correct hash and verifies the test version release number. * Implement reviewer feedback * Update docker build script to be more flexible with its naming (#822) * Deleting out-of-date wallet folder in go-algorand. (#821) * Some build fixes (#818) * Some build fixes Most importantly, move the `fullversion.dat` file to the $HOME directory and use it for the name of the upload directory on s3. It should have been doing this before, but it was copying it to the wrong location on the ec2 instance. * Implement reviewer suggestions * Completely remove temp dir before re-creating it * Move `dsign` functionality to goal (#800) * Deferred persistent crash data validation (#823) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Perform the crash-decoding after responding to the event, so that the new vote won't be blocked. * undo unintended changes. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Update Dockerfile for our official docker image (#826) * fix incorrect comments (#825) * Reduce the log verbosity on scenario 3 deployed network (#828) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Reduce the amount of logs on s3 network. When running s3, our performnace is negatively impacted by high amount of logging. This change reduces the logging to warning and above. * undo Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Trigger test build (#831) * Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Restructuring * test stage * add common build params * fix build * test * Remove build parameters * wip * remove test dir * still trying to fix random build errors * updating test phase * extract build_env values * add trigger for test phase * test * removed test location * More release build fixes (#836) * Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Restructuring * test stage * add common build params * fix build * test * Remove build parameters * wip * remove test dir * still trying to fix random build errors * updating test phase * extract build_env values * add trigger for test phase * derp * remove test location * Split consensus from config (#832) * Split consensus from config. * few more changes. * netgoal: create accounts in parallel (#827) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Make parallel accounts. * undo change. * handle data race. * use atomics. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Updated job name to match on the Jenkins server (#837) * Brice/refactor make (#835) * Refactor makefile I refactored how we build libsodium to support multiple os and cpu architectures from the crypto dir. Also I added some make targets that work the way our ci pipeline needs them to. * Add flags for other linux architectures in crypto/vrf.go * Remove yum commands from configure_dev script I decided we don't need these here. I just left the which apt-get so that this script works the same but doesn't break on centos. * Add multi platform support to cicd yaml Now we have stages to do builds on different platforms utilizing docker and qemu cpu virtualization. * Refactor libsodium dep management Before the libsodium dep paths were hardcoded under cgo tags, now they're being passed in through env vars. Also throwing in a dockerfile for our cicd process. * Revert change to configure_dev.sh These changes actually aren't necessary since our build process doesn't use this script. * Switch back to using cgo tags for CFLAGS and LDFLAGS This way LDFLAGS aren't used all over the place unecessarily which could cause problems in the future. * Fix names of things in Makefile Fixed the name of crypto/lib/libsodium.a to crypto/libs/$(OS_TYPE)/$(ARCH)/lib/libsodium.a so that it reflects the updated project structure. Also changed VARIATIONS=literally_anything in ci-build to VARIATIONS=$(OS_TYPE)/$(ARCH) so that it looks like it's useful. * Update cicd.yaml to use the new shell.docker.Ensure task This task makes sure that the docker image(s) our tasks depend on are avaiable during stage executions. It either pulls the docker image or builds it from scratch when it's not available. * Fix references to crypto/lib/libsodium.a make target A travis script was referencing this directly so I fixed the target. Also, I removed an unnecessary reference in our rpm build script. * Remove ci-deps from docker build make targets Those were there by mistake, and having them kind of defeated the purpose packing those deps with the images. Also I moved ci-deps to the shell.Make target in build-local since those are necessary there. * Run build and test jobs in a docker container (#840) * Brice/fix deploy linux (#767) * Make dockerignore file This file will prevent docker build contexts from loading certain files when creating docker build contexts. I just made it a copy of .gitignore since those files don't seem to be necessary for any current Dockerfile for go-algorand. * Fix unnecessary cd into parent directory of project root This was causing huge docker contexts for no apparent reason. * Change dockerignore to include some necessary files I switched tmp to tmp/dev_pkg and tmp/out to ignore large folders that seem unnecessary for any docker build today and removed ignores for the network gen files * Limit msgp tool warning message scope (#834) * Try to reduce msgp verbosity. * update * update msgp version in go.mod * update go.sum * Remove old entries from go.sum * Refactoring peer unicast implementation (#841) * Adding topics type and test for marshall/unmarshall of the topics. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * Adding topics type and test for marshall/unmarshall of the topics. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * removing reader, separating marshall from hash. * checking in current draft. * complete the test * Adding topics type and test for marshall/unmarshall of the topics. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * removing reader, separating marshall from hash. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * checking in current draft. * complete the test * some cleanup * fixes, lint, format. * Addressing Tsachi's comments * Addressing Tsachi's comments. getNonce() fixed, and a new test added for it. * Addressing few more comments. * Unifying getResponseChannel and removeResponseChaneel * addressing Pavel's comment: correcting a comment. * Actively scan for ledgers, normalize names cross platform (#842) Make ledger wallet names more canonical, check that sending a command doesn't return an error, only run active ledger for new devices. * require Encode() and Decode() to take msgp'ed types (#798) This ensures that calls to protocol.Encode() and protocol.Decode() are going to hit fast msgp-generated encoders and decoders. There are some places where we can't use msgp-generated code yet, for whatever reason, and those still invoke the reflection-based go-codec, using protocol.EncodeRefect() and protocol.DecodeReflect(). The main intent of this commit is to clearly identify places where we still invoke go-codec, and fix some trivial cases (like passing a struct to protocol.Encode by value instead of by pointer). Later on, we can go through the calls to protocol.EncodeReflect() and protocol.DecodeReflect() to see if we can get rid of the harder cases, to reduce or eliminate the use of go-codec altogether. * Change EnsureDigest to be asynchronous. (#754) This allows nodes which have received a threshold of cert-votes but not the corresponding block to continue to relay messages as normal. This prevents nodes in this state from inadvertently partitioning the network, which can cause stalls in very rare cases. - certThresholds now stage values in the proposal hierarchy, and essentially act like softThresholds (for the event.period) - Note: we can receive certThresholds for the previous period (but not softs, which aren't the freshest bundle). So now we can stage a value for the previous period, which is a side effect. - certThresholds fast forward periods and prevents subsequent period changes in the current round. - Do not cancel cryptographic verification of cert-bundles from old periods and continue to relay them. - Adds stageDigestAction, distinct from ensureAction, to signal the ledger that it should attempt to fetch the block given a certificate. It is not a blocking operation. - certThreshold without payloads now trigger stageDigestAction - If we receive a payload, check if cert is freshest bundle; if so, finish round. Co-authored-by: ben <me@vervious.com> * Strip any defined remote repo from branch name when building (#850) When using a wildcard (*) character to watch multiple branches when polling in Jenkins, the GIT_BRANCH environment variable will be "origin/rel/beta" instead of just "rel/beta". This breaks our tooling, but a simple fix is this util which simply strips any matched remote repo from the env var string value. * Implement DNSSEC resolving library (#830) * Implement DNSSEC resolving library * A, AAAA, SRV, CNAME lookup with sig verification * Recursive ip address lookup from CNAME with sig verification * Cached trust chain that is updated on DNSKEY cached sig expiration or zone signing key (ZSK) miss needed for end-user request's sig verification or DS-record confirmation on the chain update * Test harness includes a mock NS implementation for DNS-aware NS server * Closes #251 RFCs used: 1. DNS https://tools.ietf.org/html/rfc1035 2. DNS clarifications https://tools.ietf.org/html/rfc2181 3. DNSSEC proto change https://tools.ietf.org/html/rfc4035 4. DNSSEC RR change https://tools.ietf.org/html/rfc4034 5. DNSSEC clarifications https://tools.ietf.org/html/rfc6840 6. DNSSEC keys management https://tools.ietf.org/html/rfc6781 7. DNS SRV https://tools.ietf.org/html/rfc2782 * Utility to check relays' DNSSEC support * Make DNSSEC resolver interface compatible with net.Resolver * Use context * Change LookupCNAME: fail only if no A/AAA record, do not fail if no CNAME * Change LookupSRV: sort records by priority and randomize by weight * Change LookupIPAddr: always make recursive lookup * Implement missed functions like LookupTXT * Use DNSSEC for SRV retrieval * Make DNSSEC thread safe * Add deadlock.Mutex to protect cached trust chain * Always use a new instance of dns.Client to work around a race in ExchangeContext * Address review comments * Get rid of pointers to arrays * Add time param to verify* and makeTrustedZone functions to make tests against real DNSKEY/RRSIG snapshot robust * Rewrite UDP/TCP retries * Renames * Disable failed attempts to retrieve SRV in agreement gossip tests * Implement DNSSecurityFlags config variable * New config version and migration * Implement DNSSEC-aware DialContext * Closes #253 * Implement LookupTLSA * Tests for LookupTXT, NS, MX, TLSA * Minor comments and code fixes * Code review fixes * disable the concurrent wallet generation. (#848) * Force docker to use `root` as the user when running the instance (#849) By default, docker will use the root user, but the jenkins pipeline docker plugin inexplicitly runs the instance under the permissions of the user that launched the script that contains the docker command. * Improve some error checking and logging for build process (#851) * Fix comment in agreement. (#856) * Add MoI to network (#853) * Implement message of interest * Add missing file. * Make the ping handler optional. * fix typo. * Improve unit testing. * update return variable name, * Add comment. * Better error case handling in database utils (#857) * Fix few error handling edge cases * Fix bug in setupAgreementWithValidator * Better fix. * Explicitly curl go.1.12.9 and archive `get_latest_go.py` (#855) The golang download page was changed and our pinned version of golang is no longer referenced on it. This was breaking our build. Instead, for now we'll explicitly download the tarball via `curl`. https://golang.org/dl/?mode=json * Trap errors and remove ec2 instance (#854) Add error handling for the release build pipeline. * Update the update script. (#670) * Faster external_build_printlog by using curl instead of aws cli (#847) * Fix concurrent SQLite initialization (#872) * SQLite init is not thread safe and mattn/go-sqlite3 does not care * When open any db first time do it synchronously in order to make a nested sqlite3_initialize() the first call non-concurrently * Re-enable mutli-threaded account generation * Closes #846 * change _tx_lock -> _txlock (#871) * Redirect stdout of build log file to build release upload directory (#873) * Install boto3 as a build dependency for docker (#875) * Enable some skipped test on MacOS (#876) * Asset tests * Rest client test * Send-Receive test (TestAccountsCanSendMoney) - takes 16 minutes * Set root as explicit docker user for test phase (#874) * Refactor are combine the phonebook implementations (#870) Merge the three phonebooks implementations into one. * Adding a verifying signatures step to the build release pipeline (#878) * Wrap entire arguments in quotes Co-authored-by: Tsachi Herman <tsachi.herman@algorand.com> Co-authored-by: pzbitskiy <pavel@algorand.com> Co-authored-by: Derek Leung <derek@algorand.com> Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> Co-authored-by: algobolson <45948765+algobolson@users.noreply.github.com> Co-authored-by: Rotem Hemo <rotem@algorand.com> Co-authored-by: Will Winder <wwinder.unh@gmail.com> Co-authored-by: Max Justicz <max@justi.cz> Co-authored-by: algoradam <37638838+algoradam@users.noreply.github.com> Co-authored-by: Evan Richard <EvanJRichard@users.noreply.github.com> Co-authored-by: Nickolai Zeldovich <nickolai@csail.mit.edu> Co-authored-by: bricerisingalgorand <60147418+bricerisingalgorand@users.noreply.github.com> Co-authored-by: Shumo Chu <stechu@users.noreply.github.com> Co-authored-by: ben <me@vervious.com>
tsachiherman
added a commit
that referenced
this pull request
Mar 24, 2020
* Bump mainnet pregen to 1.0. (#569) * add lease to asset cmds (#575) * fix Disassemble when multiple bnz have the same target label (#612) add test * Replacing apt by apt-get (#610) * Add PeerConnections to network telemetry (#607) * Add PeerConnections to network telemetry. * omit Endpoint for incoming connections. * Fix license errors, enable check_license in travis. * Remove trailing whitespace. * add ?raw=1 to local block api to return msgpack bytes with full data (#621) * Let dsign sign arbitrary bytes, not just txids (#577) * Add markdown docs for `limit-order-a`, Fix `hltc` -> `htlc` (#619) * Created `test_release.sh` to test centos|fedora|ubuntu images (#613) * Created `test_release.sh` to test centos|fedora|ubuntu images * Incorporate some review suggestions (more to come): - change `apt` to `apt-get` - remove command to start the node - add `ENTRYPOINT` command to build image and test in one command - streamline command that downloads release and cleanup - moved script to `./test/packages/' - make `apt-get update` with the env var a one-liner * Add ability to pass bucket, channel and aws creds * Ensure aws creds are in env before starting * Make colorized text more readable * Break script into `build` and `run` operations * Run `update.sh` at RUN time This is another intermediate step. The installer is now being run at runtime, but it's not allowing for testing any binaries, such as `algod`. At this point, there are a couple different options to proceed, and I think it's best if Will, Tsachi and I talk more about the options. * We're not writing the Dockerfile to disk before running it. See my explanatory comment in the script. * Added new `post_deploy` stage and our script * Adding new `scripts/travis/test_release.sh` script This simply calls `./test/packages/test_release.sh`. Also, added name to `allow_failures`. * Add filtering for new `post_deploy` stage * Simplified the release scripts that build images to push to docker hub (#623) * Simplified the release scripts that build images to push to docker hub In pushing the updated images to docker hub, I noticed that the Dockerfiles and the shell scripts were only differentiated by the network name (stable|testnet). The only file in the dir is now `build_stable.sh`. It accepts a sole argument, `-n` or `--name`. It will default to "stable", so the for that image it's only necessary to run `./build_stable.sh` with no args. For "testnet", simply call the script like this: `build_stable.sh -n testnet`. The Dockerfile will be automatically created and passed to the `docker build` command via `stdin`. * Removed the case block for cli arguments Now, testing for either "mainnet" or "testnet" and returning early if neither value is present (defaults to "mainnet"). Also, changed the name to `build_releases.sh` since "stable" is no longer applicable. * Add `export SHELLOPTS` to teal tests. (#627) * Add `goal ledger block` (#622) * add goal ledger rawblock cmd * Bring `shellcheck` into the build process (#626) * Bring `shellcheck` into the build process Let's use bitwise operations to determine package presence * Added `check_shell` target to Makefile * Move install of shellcheck into `scripts/configure_dev.sh` Also, add shellcheck dependency to other dockerfiles. * Use `find` command in make target instead of recursive globbing What's up with the `exec +` syntax? From the man page: ``` -exec command {} + This variant of the -exec action runs the specified command on the selected files, but the command line is built by appending each selected file name at the end; the total number of invocations of the command will be much less than the number of matched files. The command line is built in much the same way that xargs builds its command lines. Only one instance of `{}' is allowed within the command, and (when find is being invoked from a shell) it should be quoted (for example, '{}') to protect it from interpretation by shells. The command is executed in the starting directory. If any invocation returns a non-zero value as exit status, then find returns a non-zero exit status. If find encounters an error, this can sometimes cause an immediate exit, so some pending commands may not be run at all. This variant of -exec always returns true. ``` * Only check for missing dependencies List any that are missing and the echo the script to run to install. * Fix issue on macOS to make script portable (#632) * Remove "Created new rootkey/partkey" spam message. (#629) * fix asset unit name display in goal account list (#633) * Ensure that the proper channel is passed to `test_release.sh` (#634) * Minor improvements to `test_release.sh` script (#636) - Removed a redundant `exit` statement. - Added script name to error statement. * Cleanup evalAux (#628) * remove evalAux which hasn't been used since before 1.0 * comment removal of auxdata column * Add --no-sig flag to goal clerk multisig sign (#647) * add --no-sig flag to goal clerk multisig sign * update err message * change preimage -> template * change template -> information * Scan for ledger wallets more often (#638) * add more robust ledger scanning, fix infinite recursion bug * fix comment * undo scan change * still delete wallets we fail to close * Exit early if `test_release.sh` script fails (#643) * Improve missing msig preimage error message (#648) * improve missing msig preimage error message * improve err msg * Add support for https for telemetry servers (#649) * Add support for https for telemetry servers. * typo : udo -> udp * Fixed few typos. * goal listpartkeys display error (#641) * Fixing arm64 environment issues (#653) 1) python3-venv libffi-dev libssl-dev libffi-dev (and libssl-dev) are needed by the cryptography package builder for python in e2e_basic_start_stop. 2) exporting GOPATHBIN needed to run algotmpl in template e2e tests. * Test pre-packaged executable on variety of linux platforms (#651) * Add platform testing using docker for generated binaries. * Fix path. * Apply reviewer's requested changes. * Reduce e2e_go_tests execution time twice (#645) There are seven major contributors to integration tests running time TestOnlineOfflineRewards (1248.64s) TestAssetConfig (364.71s) TestRewardRateRecalculation (226.78s) TestStartAndEndAuctionTenUsersOneBidEach (196.34s) TestNoDepositAssociatedWithBid (189.74s) TestDeadbeatBid (188.70s) TestStartAndCancelAuctionNoBids (183.35s) This commit considers only first three. 1. Fixing rewards interval in config for TestRewardRateRecalculation from 25 to 10 reduces time twice: TestRewardRateRecalculation (119.34s) 2. Fixing initialRound in TestOnlineOfflineRewards test from 301 to 11 reduces time 15 times: TestOnlineOfflineRewards (73.80s) 3. TestAssetConfig looks long by design - commits and waits max allowed assets 4. Address TODO in run_integration_tests.sh. Now e2e_client_runner calls 'goal network delete' to reflect this removal Refers #508 * Promote test_release.sh so that it won't conflict with release testing. (#655) * Fix concurrent access to wallet handles cache in goal (#654) * Fix concurrent access to wallet handles cache in goal * In rare cases (i.e. e2e tests run in parallel on the same network) a race cond happens when accessing goal.cache/walletHandles.json file * Introduce advisory locking on the mentioned file * Implementation is extendable by implementing *locker* interface for specific platform and providing a new *newLockedFile* constructor. * Address PR review notes * Do no truncate before obtaining the lock * Increase waiting interval to 10 ms * Simplify newLockedFile constructor * Allow upgrades to specify the delay before their execution. (#650) This replaces UpgradeWaitRounds with MinUpgradeWaitRounds and MaxUpgradeWaitRounds. Proposers specify an upgrade's delay given their own ApprovedUpgrades, encoding the proposed delay in the UpgradeVote. Verifiers check that the delay sits between MinUpgradeWaitRounds and MaxUpgradeWaitRounds (inclusive). This commit adds this functionality but does not change current behavior. * Set explicit 30 sec timeout for AlgorandGoal::RawSend in expect test (#658) * Should help with sporadic failures when we send and TEAL in groups * Support variable-delay protocol upgrades in ConsensusFuture. (#659) Also add some unit tests for variable-delay protocol upgrades. * Shant/catchup stop on unapproved (#660) * A fix for arm64 failures One observation from the failures is that the test timeouts could be the cause of the failure. Expect scripts when called from go test using CombinedOutput is behaving strange (slow). Replacing CombinedOutput with Run. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. Fixing errors and adding comments. * Fixing merge and comment. * added comment * Stop catchup on unapproved protocol round Catchup to stop before fetching the next round if the round protocol is not approved by the node * Some fixex. Review comments from Tsachi. * File accidentally added here. removing. * Reverting changes mistakenly added to this branch. * Adding comment changes. * Partially working test * Adding test to catchup stop on unsupported block Using s.cancel we are droppng the last block. * More tests and development to the catchup service * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. Addressing Tsachi's review comments. * Combine condition blocks * Fixing an error in the log info statement. * Compile linux/amd64 binaries with static linking (#625) * Test static compilation. * remove -fPIC * Try with ubuntu 18.04, since it has newer GCC. * exclude buildmode from test builds. * Fixed missed buildmode. * Refactor. * Add logging for the telemetry server connections (#661) * Add logging for the telemetry server connections. * Revert unintended change. * Improve error message. * add bool support to algocfg (#667) e.g. `algocfg set -p EnableProcessBlockStats -v true` * Reduce execution time of expect tests (#665) * CombinedOutput blocks on copying empty stderr stream from expect that causes at least 60 sec timeout for most of the tests * This implementation uses a temp time for stderr accumulation. In this case exec.Cmd does not run goroutines for reading child's actual stderr. * 655 sec (before) vs 205 sec (after) * Avoid upgrading boost on travis Mac builds (#669) * specify a boost version for the mac build. * try to prevent boost update on travis mac builds. * Abort algod startup if logging.config file has bad permissions (#662) * This should prevent telemetry event loses on systems with invalid permissions on ~/.algorand/logging.config file * Another possible workaround is to relax default config path mask in **cmd/goal/commands.go:ensureCacheDir** from 700 to 744. This is not implemented because of possible security risk. * Add error logging for getting a cached wallet handle (#663) Needed to debug 'Couldn't read password: inappropriate ioctl for device' error message in tests * Update license date 2019 -> 2020 (#674) * Change 2019 -> 2020 * Update readme. * Update copyright to use date range. (#676) * Tee existing tests so we can review output before piping it forward. (#677) * Make gracefull exit of a node that is waiting for WaitForBlock call (#679) * Make gracefull exit of a node that is waiting for WaitForBlock call. * Add comment. * Remove tput where not supported by terminal (#682) * Remove tput where not supported by terminal. * send tput errors to dev/null * Fix bad constants. * Avoid waiting for block that won't be reached due to unsupported protocol upgrade. (#681) * Fix - Indexer now shows received transactions (#684) -- Adding receiver function to transaction that returns the receiver of a transaction -- Fix indexer to show received transactions * Undo teeing to dev/tty as it doesn't work well in terminal free environments. (#689) * Improve lockFile error handling (#687) * Better lockFile error handling. * Make blocking locker. * Fix F_OFD_GETLK constant. * bugfix. * Try platform specific code. * use unix package to include F_OFD_SETLKW * remove unused imports. * Rename files. * Catchup service stop on unsupported and e2e test (#685) * A fix for arm64 failures One observation from the failures is that the test timeouts could be the cause of the failure. Expect scripts when called from go test using CombinedOutput is behaving strange (slow). Replacing CombinedOutput with Run. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. Fixing errors and adding comments. * Fixing merge and comment. * added comment * Stop catchup on unapproved protocol round Catchup to stop before fetching the next round if the round protocol is not approved by the node * Some fixex. Review comments from Tsachi. * File accidentally added here. removing. * Reverting changes mistakenly added to this branch. * Adding comment changes. * Partially working test * Adding test to catchup stop on unsupported block Using s.cancel we are droppng the last block. * More tests and development to the catchup service * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. Addressing Tsachi's review comments. * Combine condition blocks * Fixing an error in the log info statement. * Draft: Test for upgrading a node while keeping another node not upgradable goal node status field for informing if the node is upgradable * Catchup service stop on unsupported, ode status message, and e2e test In this change: Updated catchup service to stop on unsupported and not unupgradable. Updated goal node status to inform when the catchup service is stopped. Updated goal node status by removing last synced information. Added e2e test for stopped catchup service on unsupported protocol. * Separating goal changes from this PR. Separating goal changes from this PR. goal changes are in PR: https://github.com/algorand/go-algorand/pull/686 * review comment: use NotEqual instead of True * Make ARM64 build mandatory. (#694) * Updates to the goal node status (#686) * Updates to the goal node status This change is splitting the goal section from PR: https://github.com/algorand/go-algorand/pull/685 Updated goal node status to inform when the catchup service is stopped. Updated goal node status by removing "Synced Since Startup" field. * Adding parameter StoppedAtUnsupportedRound to v1.NodeStatus and node.StatusReport * Adding check to libgoal Client StoppedAtUnsupportedRound in v1.NodeStatus true and false values. * Review comments from Tsachi: using the timeout in select * Updating the test to reflect the removal of: has synced since startup. * telemetry recorded locally as info log (#666) config.json: {"TelemetryToLog":true} logging.config: {"Enable":false,"SendToLog":true} * Relax StartNetwork regex (#696) * relax StartNetwork regex. * Another attempt. * Two fixes to basicCatchup_test: cloned node not stopped and env var conflict (#697) * Updates to the goal node status This change is splitting the goal section from PR: https://github.com/algorand/go-algorand/pull/685 Updated goal node status to inform when the catchup service is stopped. Updated goal node status by removing "Synced Since Startup" field. * Adding parameter StoppedAtUnsupportedRound to v1.NodeStatus and node.StatusReport * Adding check to libgoal Client StoppedAtUnsupportedRound in v1.NodeStatus true and false values. * Review comments from Tsachi: using the timeout in select * Two fixes to basicCatchup_test: cloned node not terminated and env var collision 1) TestBasicCatchup and newly added TestStoppedCatchupOnUnsupported create a new node by cloning one of the network nodes. When fixture.Shutdown() stops the original network nodes, leaves the cloned node running. This change adds function shutDownClonedNode to stop the cloned nodes. 2) In TestStoppedCatchupOnUnsupported, an env variable is used to delete ConsensusCurrentVersion, so that the cloned node behaves as if its binary does not support the consensus version. However, when the TestBasicCatchup runs in parallel, it also picks up the env variable, and consequently deletes ConsensusCurrentVersion from the Consensus map. When this happens, TestBasicCatchup sporadically fails. In this change, instead of having ConsensusTestUnupgradedProtocol upgrade to ConsensusCurrentVersion, or deleting ConsensusCurrentVersion so it cannot be upgraded, it sets up ConsensusTestUnupgradedProtocol to upgrade to ConsensusTestUnupgradedToProtocol. Hence, the env variable is used to delete ConsensusTestUnupgradedToProtocol. This way the conflict with other tests is eliminated. * Fixing golint by addint comment. * Tsachi's review comment: unsetting the env var. * Make scripts executable. (#702) * More reliable fetcher unit tests. (#708) * Avoid starting the Telemetry service when logging is disabled (#703) if remote telemetry is not enabled, do not start uri update service add a nil check * Shutdown kmd when test fixture is going down. (#709) * Fix unit test. (#711) * Execute e2e tests one at a time on arm64 (#701) * Test changes. * Better error reporting on goalFixture * Add version query for kmd startup. * Few more test cases to cover. * try to wait. * changes * Update. * Move KMD shutdown to network. * Add some debug messages to figure out what's going on. * Fix script bug. * Fix proper KMD shutdown via the KMDFixture * Run the tests one at a time only on arm64 * Updating according to review. * Disable pprof endpoints by default (#693) * enable go profiler for netdeploy * add EnableProfiler to ConfigJSONOverride * Update the makefile to skip the static linking when compiling on centos. (#713) * Fail e2e-go tests when node panics (#699) * Fail test on panic * few more touchups. * sync * bugfix. * Update few more usecases. * Refactoring * Simplify. * undo network referencing. * undo few func-ptr. * undo some more stuff. * Update method names * Few more touchups. * Build release job (#698) * Initial commit * Added Jenkinsfile * Updated Jenkinsfile * Works until GPG IPC * Move build files into new release/ dir Also, renamed files {build_,}release.sh and {build_,}setup.sh * Path issues * Use t2.xlarge instance type (4 vCPUs, 16GB ram) * Restructuring * shellchecked * fix bug * Added new `socket.sh` file * Trying to build rpm * Bump up disk size of ec2 instance * more attempts to make rpm * more fixes * move /stuff -> /root/stuff * wip * moved to correct paths * Have `release` have its own start and kill ec2 instance scripts * use buildhost scripts after all * Make sure the gpg key name matches!!!!! -%_gpg_name Algorand RPM <rpm@algorand.com> +%_gpg_name rpm algorand <rpm@algorand.com> * fixes * Add upload stage to pipeline * Add tag stage to pipeline * more fixes * Move start/stop ec2 instance scripts back into release/ * Add ability to dynamically set branch * Added controller/ subdir * Some cleanup * Adding tag support Moved `Jenkinsfile` into controller/ subdir. * Move build_env build.sh -> setup.sh Moved socket.sh -> controller/socket.sh * Revert buildhost changes * some cleanup * fix build * test packages locally * upload packages to s3 test bucket * restructure * misc * fix build * Add Jenkins parameters * fix build * Move commands into Jenkinsfile into stages/ * fix build * Make test stage more explicit * fix build * Implementing reviewer suggestions * Added debug info * fix build * Merge into master * implement reviewer suggestions * turn off test stage * fix build * fix build * fix build * Update readme * removed unneeded archive/ dir * Use service-wide logger instead of logging.Base() in agreement (#714) * Switch from default logger to pre-configured logger in some components of agreement service * Mark some of the slow e2e tests as such (#719) * Mark some of the slow e2e tests as such. * Move shorttest flag to be set at top level. * Wait test less restrictive. (#718) * Move slow test to get executed on nightly builds (#721) * Move some more test to be "slow tests", and modify short test condition so that we will run the long tests on nightly builds only. * Fix elif -> else * Faster upgrade tests. (#722) * Disable failing test. (#724) * Generate docs for algokey. * s/goal/algokey * Improve algons error logging (#733) * Write body when erroring on SRV/DNS records update. * Few more error messages. * ledger/eval refactor (#700) refactor ledger/eval block validation don't do crypto+lsig validation in eval fix sync in backlog executer queue clean up lots of logging to make tests quieter * Fix a bug in Credential.lowestOutput caused by improper domain separation (#716) * Fix a bug in Credential.lowestOutput caused by improper domain separation The bug causes larger accounts to be block proposers more often than should happen based on their fraction of online stake. This patch will cause nodes to vote for a protocol upgrade that fixes the buggy behavior. After the protocol upgrade goes through, all the upgrade-related code in this commit should be removed, as it's not necessary to retain the old buggy behavior for catchup. (For convenience code to be removed is marked with a "TODO(upgrade)" comment.) * Typofix; fix merge issue * Fix test * Add a comment to make the linter happy * Typo fixes * Goal docs tweaks (#731) * test all `goal ... -h` (#730) * test all `goal ... -h` ensures no conflicting subcommand options adds less than 2 seconds to test time * review tweak, rearrange to sub test script * actually pass args * grr, arg * Move EnsureDigest logic into the catchup service (#726) * Move EnsureDigest logic into the catchup service. * update unit tests. * Add unit testing for new catchup feature. * updating per review. * Add handing for concurrently updated round. * Add comment. * typo * Correct the quit semantics. * Faster stringer implementation for Address (#736) * Faster stringer implementation. * Optimize UnmarshalChecksumAddress as well. * Add comment. * Interconnect relays on a locally deployed network (#742) * static codegen for msgpack encode/decode (#578) Implement static code generation for msgpack encoding and decoding of blocks and transactions. The existing functions `protocol.Encode` and `protocol.Decode` invoke the generated encoders and decoders if present. Benchmarking block encode/decode suggests this is about 4x faster than go-codec (which we were using previously). When changing existing data structures to be encoded, or adding new ones, run `make msgp`. Some code is still using go-codec (notably agreement). If we convert all code to use this static code generation plan, we could get rid of the dynamic check and dispatch in `protocol.Encode` and `protocol.Decode`. Having fast encoding/decoding is not only good for performance, but allows us to remove complex optimizations (like caching txid values or encoding lengths, removed in this commit), and might allow us to perform checks that we previously thought would be too expensive (like making sure that an encoding is canonical, by re-encoding). Having explicitly generated code also makes it easier to understand performance and tweak it further. Results from pprof should be much less opaque (no reflection) and more actionable. Explicit codegen also makes it clear when we make a change that affects encoding/decoding of network messages. The code generation is done using a modified version of github.com/tinylib/msgp, forked as github.com/algorand/msgp. * Use cobra for the kmd command to allow for documentation automation. * Limit client side connection rate, part 1 * Draft of the solution * saving current changes * saving current changes * saving current changes * saving current changes * saving current changes * saving current changes * saving current changes * Addressing review comments. * fixing test failure * fixing test failure2 * Adding a unit test * txsync now will go through http request connection limit. * Addressing review comments. Changing phonebookEntries duration type from uint to time.Duration * fixint test failure. * splitting wait for connection time and add connection time. Addressing some review comments. * recording provisional time before connect, updating after. * minor fixes * Embedding MockNetwork in mock structs which implment GossipNode to avoid the implmentation of dummy functions to satisfy the interface. * not embedding by reference. * A few more review comment fixes. * Fix checkdep message. (#745) * Fix equal stake distribution in generated networks (#749) * Use math.big.Rat rational numbers to get rid of summation error * Root cause although in JSON serialization of float64 data type so that some values are rounded and others are not. Correct fix seems to be in using the same accuracy in distribution code and float64 marshaling. * Update with PR feedback. * Change a player test to use either old buggy behavior or new correct behavior depending on ConsensusCurrentVersion. (#748) This allows agreement tests to pass whether ConsensusCurrentVersion is the old V20 or the new V21 * Bugfix: Fix last relevant proposal period in agreement protocol. (#746) When retrieving the last relevant period corresponding to a proposal-value, the proposal store inside the agreement protocol does not properly check that the particular period returned actually matches the passed-in proposal-value. Instead, the proposal store returns the last period seen for *any* proposal-value. When the agreement state machine receives a proposal payload, the proposal store checks whether this payload matches any proposal-value known to be relevant in the current round. If it does, the state machine tells the crypto verifier to verify the new payload. As an optimization, the proposal store in the state machine also tags the payload with the last period in which it is relevant (and whether the matching proposal-value is pinned). The crypto verifier halts concurrent verification of any payload from that period. Separately, the proposal store does not attempt to verify payloads more than once, caching past payloads it has pipelined. For this optimization to be correct, the last relevant period must be correct; otherwise, the network will permanently stall if the following occurs: - In period p, the network observes a best proposal value of v, but it sees neither the payload B corresponding to v nor a threshold of soft-votes for B (seeing such a threshold pins B, preventing the crypto verifier from cancelling). - An attacker is able to see B. - In period p+1, the network attempts to agree on a new proposal value v' corresponding to the payload B'. - After half of the network has received B' but has _not_ finished verifying it, the attacker sends this half the payload B. This half will cancel verification of B' (since it erroneously associates B with period p+1) and will permanently ignore any future broadcasts of B' (which was cached in the proposal store). - If the other half has already staged B', the network will stall permanently, since it will be unable to commit B'. Fixes #710. Thanks to @xixisese for reporting this bug. * Format numbers using number specifier (#735) * Use %d to print numbers, which is abit safer as it prevent potential recursion. * Few more changes to the fuzzer. * Two more updates. * Implement local net template generation with netgoal (#762) * Usage: netgoal generate -n 1 -R 1 -w 100 -o mynettemplate.json -r . -t goalnet goal network create -t mynettemplate.json -r mynet -n mynet * Remove duplicate definitions from netdeploy/networkTemplate * Improve net templates support (#766) * Fix file descriptors leak in 'goal account'. Now goal can import more than maxfiles keys * Fix uint overflow in stake distribution validation. Details: values 10 and -110 were casted to uint and sum up to 100 pct with 32 bits overflow * Allow pct fraction of stake in goal net templates * Fix stake distribution in netgoal.generate: it always produces pcts and not values in algos as was incorrectly thought before * Add tests for netdeploy.Validate() * Release build pipeline step 1: Build, package, sign, deploy to staging (#763) * Reorganize * more restructuring * cleanup * removing test bits * changing upload destination * remove test dir * remove cruft * Moved Jenkinsfile -> jenkinsfile/Build * replace {RSTAMP,FULLVERSION} * fix bugs * remove temp dir location * remove buildnumber.dat * Implement automation for release notes generator (#761) The cicd.yaml config file in this branch can be consumed by our cicd cli to create a draft for release notes for a given version. * back out locking added in c78ada09f230a3c66cd934860700f93ff31a93eb (#764) * back out locking added in c78ada09f230a3c66cd934860700f93ff31a93eb * remove IsFull * bring back txn liveness check. buffer up to all payset groups in chan * no chan close * Implement dummy telemetry hook to safely perform operations on it when telemetry is disabled (#768) * The idea is have telemetry.hook always set. For telemetry disabled case this is a simple noop stub. * Prevents crashes when calling hook.Close/Flush on private networks in case of errors * Remove instances of tagging in our build process (#770) We don't want to be making tags anywhere in our automation. Our release process will take care of that. * Configurable consensus protocol (#750) * Create consensus.json * some changes.. * remove deadcode. * update constant. * Update fixture. * migrate fast upgrade protocols. * move catchup test protocol. * push staged changes. * bugfix. * Remove last test consensus param. * rollback block.go * cleanup : map[protocol.ConsensusVersion]ConsensusParams -> ConsensusProtocols * udpate. * Fix unit test. * Release build pipeline step 2: Test (#773) * Reorganize * more restructuring * begin test stuff * restructure * fix deb test * fix rpm test * fix build * restructure * fix bug * remove temporary feature branch * added new gpg.sh * removed buildnumber.dat * When locally installing, take the binaries from the first-gopath-bin directory. (#776) * Remove temporary build test location (#777) * Make sure to default to Consensus if consensus.json is missing. (#779) * Make util.ExecAndCaptureOutput able to process large output (#771) * In case of large amount of data written to stdout/stderr from the wrapped command the process is blocked until stdout/stderr buffers cleared. * Old implementation waited until cmd return and then read stdout/stderr. * New implementation reads stdout/stderr pipes in goroutines. * Make goal node state change commands systemd aware (#769) * Make goal node state change commands systemd aware I added a property to libgoal/system.go where we can set whether or not our algod process is managed by systemd. * Write expect test for goal node with systemd scenarios This tests that the message from our cli on goal node start, stop and restarts is correct for systemd_managed data_dirs. * Write expect test for goal node start, stop and restart This tests that the message from our cli on goal node start, stop and restarts is correct for data_dirs that are not managed by systemd. * Add systemd_managed: true as a default in system.json Since all linux installs currently use systemd, I added this to the base system.json file. * Restructure release/ dir (#782) * Restructure release/ dir for each build release pipeline stage First step is the `build` pipeline. * More restructuring Removed `release/ci/`. Every dir under `release/` will now be a pipeline. * Added "test" pipeline * update readme * Remove temp location and remove code cruft * removed outdated readme * more cleanup * implement reviewer changes * Allow asset creation transactions to be created while catching up. (#790) * Tunnel outgoing connection via a rate limiting dialer (#780) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Addressing Pavel's comments. * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Addressing Pavel's comments. * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Allow asset creation transactions to be created while catching up. (#790) * Addressing Pavel's comments. * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Addressing Pavel's comments. * rebasing master Co-authored-by: Tsachi Herman <tsachi.herman@algorand.com> Co-authored-by: Will Winder <wwinder.unh@gmail.com> * Release build pipeline step 3: Added "prod" pipeline to `release/` (#788) * Release build pipeline step 3: Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Implement reviewer suggestion * better algons error messages. (#794) * Create a rate limiting transport (#795) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Rate limiting transport. * remove comment. * Unify dialing path. * Removing ForceAttemptHTTP2 which isn't available on go 1.12 Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Some release abstraction (#796) * Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Restructuring * test stage * add common build params * Remove temp github location * Change agreement message encoder to msgp. (#786) * Upgrade to new version of msgp. - omitemptyarray and omitempty are correctly distinguished between in equivocationVoteAuthenticator. - The embedded Block is correctly handled in proposal, unauthenticatedProposal, and transmittedPayload. * Randomize anonymous (embedded) fields when testing codec. Co-authored-by: Nickolai Zeldovich <nickolai@csail.mit.edu> * Move fetcher client into catchup (#774) * changes. * adding dialer. * Move fetcher client into catchup, step 1. ( most unit tests are still broken ) * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * update. * fix few more unit tests. * fix syncer tests. * undo change. * Add a comment. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Fix gpg keygrip code and remove old code (#797) * bugfix : compile correctly teal program that includes a base64 string which starts with double slash (#787) * update. * Improve test. * Add support for multiple network protocol versions (#799) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Add a version-accept header to support multiple network protocol versions. * update. * Remove comments. * Addresing reviewer concerns. * Add a unit test for checkProtocolVersionMatch logic. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Include comment about something that looks like a vulnerability, but isn't. (#820) * Skip logging and telemetry when not needed. (#737) * Added utils for testing release packages (#819) * Added utils for testing release packages check_sig: Verify gpg signatures of build artifacts. test_package: Verifies the packages were built from the correct branch with the correct hash and verifies the test version release number. * Implement reviewer feedback * Update docker build script to be more flexible with its naming (#822) * Deleting out-of-date wallet folder in go-algorand. (#821) * Some build fixes (#818) * Some build fixes Most importantly, move the `fullversion.dat` file to the $HOME directory and use it for the name of the upload directory on s3. It should have been doing this before, but it was copying it to the wrong location on the ec2 instance. * Implement reviewer suggestions * Completely remove temp dir before re-creating it * Move `dsign` functionality to goal (#800) * Deferred persistent crash data validation (#823) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Perform the crash-decoding after responding to the event, so that the new vote won't be blocked. * undo unintended changes. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Update Dockerfile for our official docker image (#826) * fix incorrect comments (#825) * Reduce the log verbosity on scenario 3 deployed network (#828) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Reduce the amount of logs on s3 network. When running s3, our performnace is negatively impacted by high amount of logging. This change reduces the logging to warning and above. * undo Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Trigger test build (#831) * Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Restructuring * test stage * add common build params * fix build * test * Remove build parameters * wip * remove test dir * still trying to fix random build errors * updating test phase * extract build_env values * add trigger for test phase * test * removed test location * More release build fixes (#836) * Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Restructuring * test stage * add common build params * fix build * test * Remove build parameters * wip * remove test dir * still trying to fix random build errors * updating test phase * extract build_env values * add trigger for test phase * derp * remove test location * Split consensus from config (#832) * Split consensus from config. * few more changes. * netgoal: create accounts in parallel (#827) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Make parallel accounts. * undo change. * handle data race. * use atomics. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Updated job name to match on the Jenkins server (#837) * Brice/refactor make (#835) * Refactor makefile I refactored how we build libsodium to support multiple os and cpu architectures from the crypto dir. Also I added some make targets that work the way our ci pipeline needs them to. * Add flags for other linux architectures in crypto/vrf.go * Remove yum commands from configure_dev script I decided we don't need these here. I just left the which apt-get so that this script works the same but doesn't break on centos. * Add multi platform support to cicd yaml Now we have stages to do builds on different platforms utilizing docker and qemu cpu virtualization. * Refactor libsodium dep management Before the libsodium dep paths were hardcoded under cgo tags, now they're being passed in through env vars. Also throwing in a dockerfile for our cicd process. * Revert change to configure_dev.sh These changes actually aren't necessary since our build process doesn't use this script. * Switch back to using cgo tags for CFLAGS and LDFLAGS This way LDFLAGS aren't used all over the place unecessarily which could cause problems in the future. * Fix names of things in Makefile Fixed the name of crypto/lib/libsodium.a to crypto/libs/$(OS_TYPE)/$(ARCH)/lib/libsodium.a so that it reflects the updated project structure. Also changed VARIATIONS=literally_anything in ci-build to VARIATIONS=$(OS_TYPE)/$(ARCH) so that it looks like it's useful. * Update cicd.yaml to use the new shell.docker.Ensure task This task makes sure that the docker image(s) our tasks depend on are avaiable during stage executions. It either pulls the docker image or builds it from scratch when it's not available. * Fix references to crypto/lib/libsodium.a make target A travis script was referencing this directly so I fixed the target. Also, I removed an unnecessary reference in our rpm build script. * Remove ci-deps from docker build make targets Those were there by mistake, and having them kind of defeated the purpose packing those deps with the images. Also I moved ci-deps to the shell.Make target in build-local since those are necessary there. * Run build and test jobs in a docker container (#840) * Brice/fix deploy linux (#767) * Make dockerignore file This file will prevent docker build contexts from loading certain files when creating docker build contexts. I just made it a copy of .gitignore since those files don't seem to be necessary for any current Dockerfile for go-algorand. * Fix unnecessary cd into parent directory of project root This was causing huge docker contexts for no apparent reason. * Change dockerignore to include some necessary files I switched tmp to tmp/dev_pkg and tmp/out to ignore large folders that seem unnecessary for any docker build today and removed ignores for the network gen files * Limit msgp tool warning message scope (#834) * Try to reduce msgp verbosity. * update * update msgp version in go.mod * update go.sum * Remove old entries from go.sum * Refactoring peer unicast implementation (#841) * Adding topics type and test for marshall/unmarshall of the topics. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * Adding topics type and test for marshall/unmarshall of the topics. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * removing reader, separating marshall from hash. * checking in current draft. * complete the test * Adding topics type and test for marshall/unmarshall of the topics. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * removing reader, separating marshall from hash. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * checking in current draft. * complete the test * some cleanup * fixes, lint, format. * Addressing Tsachi's comments * Addressing Tsachi's comments. getNonce() fixed, and a new test added for it. * Addressing few more comments. * Unifying getResponseChannel and removeResponseChaneel * addressing Pavel's comment: correcting a comment. * Actively scan for ledgers, normalize names cross platform (#842) Make ledger wallet names more canonical, check that sending a command doesn't return an error, only run active ledger for new devices. * require Encode() and Decode() to take msgp'ed types (#798) This ensures that calls to protocol.Encode() and protocol.Decode() are going to hit fast msgp-generated encoders and decoders. There are some places where we can't use msgp-generated code yet, for whatever reason, and those still invoke the reflection-based go-codec, using protocol.EncodeRefect() and protocol.DecodeReflect(). The main intent of this commit is to clearly identify places where we still invoke go-codec, and fix some trivial cases (like passing a struct to protocol.Encode by value instead of by pointer). Later on, we can go through the calls to protocol.EncodeReflect() and protocol.DecodeReflect() to see if we can get rid of the harder cases, to reduce or eliminate the use of go-codec altogether. * Change EnsureDigest to be asynchronous. (#754) This allows nodes which have received a threshold of cert-votes but not the corresponding block to continue to relay messages as normal. This prevents nodes in this state from inadvertently partitioning the network, which can cause stalls in very rare cases. - certThresholds now stage values in the proposal hierarchy, and essentially act like softThresholds (for the event.period) - Note: we can receive certThresholds for the previous period (but not softs, which aren't the freshest bundle). So now we can stage a value for the previous period, which is a side effect. - certThresholds fast forward periods and prevents subsequent period changes in the current round. - Do not cancel cryptographic verification of cert-bundles from old periods and continue to relay them. - Adds stageDigestAction, distinct from ensureAction, to signal the ledger that it should attempt to fetch the block given a certificate. It is not a blocking operation. - certThreshold without payloads now trigger stageDigestAction - If we receive a payload, check if cert is freshest bundle; if so, finish round. Co-authored-by: ben <me@vervious.com> * Strip any defined remote repo from branch name when building (#850) When using a wildcard (*) character to watch multiple branches when polling in Jenkins, the GIT_BRANCH environment variable will be "origin/rel/beta" instead of just "rel/beta". This breaks our tooling, but a simple fix is this util which simply strips any matched remote repo from the env var string value. * Implement DNSSEC resolving library (#830) * Implement DNSSEC resolving library * A, AAAA, SRV, CNAME lookup with sig verification * Recursive ip address lookup from CNAME with sig verification * Cached trust chain that is updated on DNSKEY cached sig expiration or zone signing key (ZSK) miss needed for end-user request's sig verification or DS-record confirmation on the chain update * Test harness includes a mock NS implementation for DNS-aware NS server * Closes #251 RFCs used: 1. DNS https://tools.ietf.org/html/rfc1035 2. DNS clarifications https://tools.ietf.org/html/rfc2181 3. DNSSEC proto change https://tools.ietf.org/html/rfc4035 4. DNSSEC RR change https://tools.ietf.org/html/rfc4034 5. DNSSEC clarifications https://tools.ietf.org/html/rfc6840 6. DNSSEC keys management https://tools.ietf.org/html/rfc6781 7. DNS SRV https://tools.ietf.org/html/rfc2782 * Utility to check relays' DNSSEC support * Make DNSSEC resolver interface compatible with net.Resolver * Use context * Change LookupCNAME: fail only if no A/AAA record, do not fail if no CNAME * Change LookupSRV: sort records by priority and randomize by weight * Change LookupIPAddr: always make recursive lookup * Implement missed functions like LookupTXT * Use DNSSEC for SRV retrieval * Make DNSSEC thread safe * Add deadlock.Mutex to protect cached trust chain * Always use a new instance of dns.Client to work around a race in ExchangeContext * Address review comments * Get rid of pointers to arrays * Add time param to verify* and makeTrustedZone functions to make tests against real DNSKEY/RRSIG snapshot robust * Rewrite UDP/TCP retries * Renames * Disable failed attempts to retrieve SRV in agreement gossip tests * Implement DNSSecurityFlags config variable * New config version and migration * Implement DNSSEC-aware DialContext * Closes #253 * Implement LookupTLSA * Tests for LookupTXT, NS, MX, TLSA * Minor comments and code fixes * Code review fixes * disable the concurrent wallet generation. (#848) * Force docker to use `root` as the user when running the instance (#849) By default, docker will use the root user, but the jenkins pipeline docker plugin inexplicitly runs the instance under the permissions of the user that launched the script that contains the docker command. * Improve some error checking and logging for build process (#851) * Fix comment in agreement. (#856) * Add MoI to network (#853) * Implement message of interest * Add missing file. * Make the ping handler optional. * fix typo. * Improve unit testing. * update return variable name, * Add comment. * Better error case handling in database utils (#857) * Fix few error handling edge cases * Fix bug in setupAgreementWithValidator * Better fix. * Explicitly curl go.1.12.9 and archive `get_latest_go.py` (#855) The golang download page was changed and our pinned version of golang is no longer referenced on it. This was breaking our build. Instead, for now we'll explicitly download the tarball via `curl`. https://golang.org/dl/?mode=json * Trap errors and remove ec2 instance (#854) Add error handling for the release build pipeline. * Update the update script. (#670) * Faster external_build_printlog by using curl instead of aws cli (#847) * Fix concurrent SQLite initialization (#872) * SQLite init is not thread safe and mattn/go-sqlite3 does not care * When open any db first time do it synchronously in order to make a nested sqlite3_initialize() the first call non-concurrently * Re-enable mutli-threaded account generation * Closes #846 * change _tx_lock -> _txlock (#871) * Redirect stdout of build log file to build release upload directory (#873) * Install boto3 as a build dependency for docker (#875) * Enable some skipped test on MacOS (#876) * Asset tests * Rest client test * Send-Receive test (TestAccountsCanSendMoney) - takes 16 minutes * Set root as explicit docker user for test phase (#874) * Refactor are combine the phonebook implementations (#870) Merge the three phonebooks implementations into one. * Adding a verifying signatures step to the build release pipeline (#878) * fix typo in check_deps.sh message (#884) * Update list of DNSSEC-aware resolvers (#883) * Fixing error reporting to read from the stream. (#887) * Shoehorn `test_package.sh` into the test phase (#877) * Brice/refactor cicd stages to use persistent fields (#879) * Refactor cicd.yaml to use persistent fields Now we have on task generating the docker image version used in subsequent stage tasks * Install libc-compat through musl-dev instead of installing it directly This package comes with more packages which may or may not help. * Move build actions to one make task This will speed up the build by reducing the amount of redundent make target executions. * Refactor Makefile to build using -static on alpine Also, removed the if around amd64 vs arm64 so builds are more consistent. * Remove tests from armv6 build Tests don't work on that cpu arch because --race isn't supported. * Add conditional to build arm packages with static linking * Up memory map space in centos container The default is too low for builds on amd64 * Set -static flag to ld only for arm builds on alpine This way we are limiting the static option to arm builds on our docker container. * Rename arm references for arm32v6 builds After our talk yesterday, I changed references for arm builds to be consistent with other parts of our automation. * Add some more files to .dockerignore file These files are not necessary and they make the builds take much longer * Delete go-algorand repo in builder image This always gets overwritten when it's used and it takes up a lot of space * Have build-local run all make targets at once * Remove .git folder from .dockerignore This is used by some of our automation * Strip remote repo name from branch variable name in build release pipeline (#897) * Support of older kernels for locking files (#895) * Use golang.org/x/sys/unix instead of syscall The latter package is deprecated See https://golang.org/pkg/syscall/ * Always use non-OFD locks on non-Linux OS Previously, availability of OFD locks was tested on non-Linux OS. To do that, the syscall cmd constant `syscall.F_OFD_GETLK` was hard-coded in `libgoal/lockedFileUnix.go`, because this syscall cmd constant was not available in the Go library for non-Linux OS. However, different architectures may have different syscall constants. Furthermore, it seems that currently, only Linux supports OFD locks. This commit removes hard-coded syscall constants and systematically uses non-OFD locks on non-Linus OS. * Default to non-OFD locks when OFD locks unavailable Older kernels (before 3.15, and in particular the kernel from WSL - Windows Subsystem for Linux) do not support OFD locks. This commits adds a test for the availability of OFD locks. The test is similar to what was done before in `lockedFileUnix.go`, (removed by commit 11bc50da77278021e60922f6a4d5aac2bf9e6d40) with two main differences: * no syscall constant is hardcoded * unavailability of OFD locks is more fine-grained: `errno` is checked to be `unix.EINVAL` rather than any error in case of a different `errno`, panic (this should never happen) * Re-ordering imports * Return error instead of panicking in `makeLocker` * Remove the phonebook from the node (#893) * Initial draft of: remove phonebook from node. * minor fixes * fixes from Tsachi's comments. * Rename cicd.yaml to mule.yaml (#894) We renamed our cli to mule, so our cicd.yaml file is now a mule.yaml file * Add sqlite3 as a dependency (#891) * add sqlite3 as a dependency When running `make`, `sqlite3` is used but was not included as a dependency in: * `scripts/check_deps.sh` * `scripts/configure_dev.sh` * Do not upgrade sqlite3 on macOS This is not useful and causes issues with Travis. * Catchupsrv tars (#881) * can serve from directory of M_N.tar.bz2 block tars * faster block tar access. round robin replacement. undo unused config change. * switch Mutex library * Extend timeouts for simulate_test and service_test to support (#905) ci_integration testing. * shellchecked `build_deb.sh` (#882) * shellchecked `build_deb.sh` * Test pre-packaged executable on variety of linux platforms (#651) * Add platform testing using docker for generated binaries. * Fix path. * Apply reviewer's requested changes. * Reduce e2e_go_tests execution time twice (#645) There are seven major contributors to integration tests running time TestOnlineOfflineRewards (1248.64s) TestAssetConfig (364.71s) TestRewardRateRecalculation (226.78s) TestStartAndEndAuctionTenUsersOneBidEach (196.34s) TestNoDepositAssociatedWithBid (189.74s) TestDeadbeatBid (188.70s) TestStartAndCancelAuctionNoBids (183.35s) This commit considers only first three. 1. Fixing rewards interval in config for TestRewardRateRecalculation from 25 to 10 reduces time twice: TestRewardRateRecalculation (119.34s) 2. Fixing initialRound in TestOnlineOfflineRewards test from 301 to 11 reduces time 15 times: TestOnlineOfflineRewards (73.80s) 3. TestAssetConfig looks long by design - commits and waits max allowed …
btoll
pushed a commit
to btoll/go-algorand
that referenced
this pull request
Apr 7, 2020
btoll
pushed a commit
to btoll/go-algorand
that referenced
this pull request
Apr 7, 2020
* shellchecked `build_deb.sh` * Test pre-packaged executable on variety of linux platforms (#651) * Add platform testing using docker for generated binaries. * Fix path. * Apply reviewer's requested changes. * Reduce e2e_go_tests execution time twice (#645) There are seven major contributors to integration tests running time TestOnlineOfflineRewards (1248.64s) TestAssetConfig (364.71s) TestRewardRateRecalculation (226.78s) TestStartAndEndAuctionTenUsersOneBidEach (196.34s) TestNoDepositAssociatedWithBid (189.74s) TestDeadbeatBid (188.70s) TestStartAndCancelAuctionNoBids (183.35s) This commit considers only first three. 1. Fixing rewards interval in config for TestRewardRateRecalculation from 25 to 10 reduces time twice: TestRewardRateRecalculation (119.34s) 2. Fixing initialRound in TestOnlineOfflineRewards test from 301 to 11 reduces time 15 times: TestOnlineOfflineRewards (73.80s) 3. TestAssetConfig looks long by design - commits and waits max allowed assets 4. Address TODO in run_integration_tests.sh. Now e2e_client_runner calls 'goal network delete' to reflect this removal Refers #508 * Promote test_release.sh so that it won't conflict with release testing. (#655) * Fix concurrent access to wallet handles cache in goal (#654) * Fix concurrent access to wallet handles cache in goal * In rare cases (i.e. e2e tests run in parallel on the same network) a race cond happens when accessing goal.cache/walletHandles.json file * Introduce advisory locking on the mentioned file * Implementation is extendable by implementing *locker* interface for specific platform and providing a new *newLockedFile* constructor. * Address PR review notes * Do no truncate before obtaining the lock * Increase waiting interval to 10 ms * Simplify newLockedFile constructor * Allow upgrades to specify the delay before their execution. (#650) This replaces UpgradeWaitRounds with MinUpgradeWaitRounds and MaxUpgradeWaitRounds. Proposers specify an upgrade's delay given their own ApprovedUpgrades, encoding the proposed delay in the UpgradeVote. Verifiers check that the delay sits between MinUpgradeWaitRounds and MaxUpgradeWaitRounds (inclusive). This commit adds this functionality but does not change current behavior. * Set explicit 30 sec timeout for AlgorandGoal::RawSend in expect test (#658) * Should help with sporadic failures when we send and TEAL in groups * Support variable-delay protocol upgrades in ConsensusFuture. (#659) Also add some unit tests for variable-delay protocol upgrades. * Shant/catchup stop on unapproved (#660) * A fix for arm64 failures One observation from the failures is that the test timeouts could be the cause of the failure. Expect scripts when called from go test using CombinedOutput is behaving strange (slow). Replacing CombinedOutput with Run. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. Fixing errors and adding comments. * Fixing merge and comment. * added comment * Stop catchup on unapproved protocol round Catchup to stop before fetching the next round if the round protocol is not approved by the node * Some fixex. Review comments from Tsachi. * File accidentally added here. removing. * Reverting changes mistakenly added to this branch. * Adding comment changes. * Partially working test * Adding test to catchup stop on unsupported block Using s.cancel we are droppng the last block. * More tests and development to the catchup service * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. Addressing Tsachi's review comments. * Combine condition blocks * Fixing an error in the log info statement. * Compile linux/amd64 binaries with static linking (#625) * Test static compilation. * remove -fPIC * Try with ubuntu 18.04, since it has newer GCC. * exclude buildmode from test builds. * Fixed missed buildmode. * Refactor. * Add logging for the telemetry server connections (#661) * Add logging for the telemetry server connections. * Revert unintended change. * Improve error message. * add bool support to algocfg (#667) e.g. `algocfg set -p EnableProcessBlockStats -v true` * Reduce execution time of expect tests (#665) * CombinedOutput blocks on copying empty stderr stream from expect that causes at least 60 sec timeout for most of the tests * This implementation uses a temp time for stderr accumulation. In this case exec.Cmd does not run goroutines for reading child's actual stderr. * 655 sec (before) vs 205 sec (after) * Avoid upgrading boost on travis Mac builds (#669) * specify a boost version for the mac build. * try to prevent boost update on travis mac builds. * Abort algod startup if logging.config file has bad permissions (#662) * This should prevent telemetry event loses on systems with invalid permissions on ~/.algorand/logging.config file * Another possible workaround is to relax default config path mask in **cmd/goal/commands.go:ensureCacheDir** from 700 to 744. This is not implemented because of possible security risk. * Add error logging for getting a cached wallet handle (#663) Needed to debug 'Couldn't read password: inappropriate ioctl for device' error message in tests * Update license date 2019 -> 2020 (#674) * Change 2019 -> 2020 * Update readme. * Update copyright to use date range. (#676) * Tee existing tests so we can review output before piping it forward. (#677) * Make gracefull exit of a node that is waiting for WaitForBlock call (#679) * Make gracefull exit of a node that is waiting for WaitForBlock call. * Add comment. * Remove tput where not supported by terminal (#682) * Remove tput where not supported by terminal. * send tput errors to dev/null * Fix bad constants. * Avoid waiting for block that won't be reached due to unsupported protocol upgrade. (#681) * Fix - Indexer now shows received transactions (#684) -- Adding receiver function to transaction that returns the receiver of a transaction -- Fix indexer to show received transactions * Undo teeing to dev/tty as it doesn't work well in terminal free environments. (#689) * Improve lockFile error handling (#687) * Better lockFile error handling. * Make blocking locker. * Fix F_OFD_GETLK constant. * bugfix. * Try platform specific code. * use unix package to include F_OFD_SETLKW * remove unused imports. * Rename files. * Catchup service stop on unsupported and e2e test (#685) * A fix for arm64 failures One observation from the failures is that the test timeouts could be the cause of the failure. Expect scripts when called from go test using CombinedOutput is behaving strange (slow). Replacing CombinedOutput with Run. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. Fixing errors and adding comments. * Fixing merge and comment. * added comment * Stop catchup on unapproved protocol round Catchup to stop before fetching the next round if the round protocol is not approved by the node * Some fixex. Review comments from Tsachi. * File accidentally added here. removing. * Reverting changes mistakenly added to this branch. * Adding comment changes. * Partially working test * Adding test to catchup stop on unsupported block Using s.cancel we are droppng the last block. * More tests and development to the catchup service * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. Addressing Tsachi's review comments. * Combine condition blocks * Fixing an error in the log info statement. * Draft: Test for upgrading a node while keeping another node not upgradable goal node status field for informing if the node is upgradable * Catchup service stop on unsupported, ode status message, and e2e test In this change: Updated catchup service to stop on unsupported and not unupgradable. Updated goal node status to inform when the catchup service is stopped. Updated goal node status by removing last synced information. Added e2e test for stopped catchup service on unsupported protocol. * Separating goal changes from this PR. Separating goal changes from this PR. goal changes are in PR: https://github.com/algorand/go-algorand/pull/686 * review comment: use NotEqual instead of True * Make ARM64 build mandatory. (#694) * Updates to the goal node status (#686) * Updates to the goal node status This change is splitting the goal section from PR: https://github.com/algorand/go-algorand/pull/685 Updated goal node status to inform when the catchup service is stopped. Updated goal node status by removing "Synced Since Startup" field. * Adding parameter StoppedAtUnsupportedRound to v1.NodeStatus and node.StatusReport * Adding check to libgoal Client StoppedAtUnsupportedRound in v1.NodeStatus true and false values. * Review comments from Tsachi: using the timeout in select * Updating the test to reflect the removal of: has synced since startup. * telemetry recorded locally as info log (#666) config.json: {"TelemetryToLog":true} logging.config: {"Enable":false,"SendToLog":true} * Relax StartNetwork regex (#696) * relax StartNetwork regex. * Another attempt. * Two fixes to basicCatchup_test: cloned node not stopped and env var conflict (#697) * Updates to the goal node status This change is splitting the goal section from PR: https://github.com/algorand/go-algorand/pull/685 Updated goal node status to inform when the catchup service is stopped. Updated goal node status by removing "Synced Since Startup" field. * Adding parameter StoppedAtUnsupportedRound to v1.NodeStatus and node.StatusReport * Adding check to libgoal Client StoppedAtUnsupportedRound in v1.NodeStatus true and false values. * Review comments from Tsachi: using the timeout in select * Two fixes to basicCatchup_test: cloned node not terminated and env var collision 1) TestBasicCatchup and newly added TestStoppedCatchupOnUnsupported create a new node by cloning one of the network nodes. When fixture.Shutdown() stops the original network nodes, leaves the cloned node running. This change adds function shutDownClonedNode to stop the cloned nodes. 2) In TestStoppedCatchupOnUnsupported, an env variable is used to delete ConsensusCurrentVersion, so that the cloned node behaves as if its binary does not support the consensus version. However, when the TestBasicCatchup runs in parallel, it also picks up the env variable, and consequently deletes ConsensusCurrentVersion from the Consensus map. When this happens, TestBasicCatchup sporadically fails. In this change, instead of having ConsensusTestUnupgradedProtocol upgrade to ConsensusCurrentVersion, or deleting ConsensusCurrentVersion so it cannot be upgraded, it sets up ConsensusTestUnupgradedProtocol to upgrade to ConsensusTestUnupgradedToProtocol. Hence, the env variable is used to delete ConsensusTestUnupgradedToProtocol. This way the conflict with other tests is eliminated. * Fixing golint by addint comment. * Tsachi's review comment: unsetting the env var. * Make scripts executable. (#702) * More reliable fetcher unit tests. (#708) * Avoid starting the Telemetry service when logging is disabled (#703) if remote telemetry is not enabled, do not start uri update service add a nil check * Shutdown kmd when test fixture is going down. (#709) * Fix unit test. (#711) * Execute e2e tests one at a time on arm64 (#701) * Test changes. * Better error reporting on goalFixture * Add version query for kmd startup. * Few more test cases to cover. * try to wait. * changes * Update. * Move KMD shutdown to network. * Add some debug messages to figure out what's going on. * Fix script bug. * Fix proper KMD shutdown via the KMDFixture * Run the tests one at a time only on arm64 * Updating according to review. * Disable pprof endpoints by default (#693) * enable go profiler for netdeploy * add EnableProfiler to ConfigJSONOverride * Update the makefile to skip the static linking when compiling on centos. (#713) * Fail e2e-go tests when node panics (#699) * Fail test on panic * few more touchups. * sync * bugfix. * Update few more usecases. * Refactoring * Simplify. * undo network referencing. * undo few func-ptr. * undo some more stuff. * Update method names * Few more touchups. * Build release job (#698) * Initial commit * Added Jenkinsfile * Updated Jenkinsfile * Works until GPG IPC * Move build files into new release/ dir Also, renamed files {build_,}release.sh and {build_,}setup.sh * Path issues * Use t2.xlarge instance type (4 vCPUs, 16GB ram) * Restructuring * shellchecked * fix bug * Added new `socket.sh` file * Trying to build rpm * Bump up disk size of ec2 instance * more attempts to make rpm * more fixes * move /stuff -> /root/stuff * wip * moved to correct paths * Have `release` have its own start and kill ec2 instance scripts * use buildhost scripts after all * Make sure the gpg key name matches!!!!! -%_gpg_name Algorand RPM <rpm@algorand.com> +%_gpg_name rpm algorand <rpm@algorand.com> * fixes * Add upload stage to pipeline * Add tag stage to pipeline * more fixes * Move start/stop ec2 instance scripts back into release/ * Add ability to dynamically set branch * Added controller/ subdir * Some cleanup * Adding tag support Moved `Jenkinsfile` into controller/ subdir. * Move build_env build.sh -> setup.sh Moved socket.sh -> controller/socket.sh * Revert buildhost changes * some cleanup * fix build * test packages locally * upload packages to s3 test bucket * restructure * misc * fix build * Add Jenkins parameters * fix build * Move commands into Jenkinsfile into stages/ * fix build * Make test stage more explicit * fix build * Implementing reviewer suggestions * Added debug info * fix build * Merge into master * implement reviewer suggestions * turn off test stage * fix build * fix build * fix build * Update readme * removed unneeded archive/ dir * Use service-wide logger instead of logging.Base() in agreement (#714) * Switch from default logger to pre-configured logger in some components of agreement service * Mark some of the slow e2e tests as such (#719) * Mark some of the slow e2e tests as such. * Move shorttest flag to be set at top level. * Wait test less restrictive. (#718) * Move slow test to get executed on nightly builds (#721) * Move some more test to be "slow tests", and modify short test condition so that we will run the long tests on nightly builds only. * Fix elif -> else * Faster upgrade tests. (#722) * Disable failing test. (#724) * Generate docs for algokey. * s/goal/algokey * Improve algons error logging (#733) * Write body when erroring on SRV/DNS records update. * Few more error messages. * ledger/eval refactor (#700) refactor ledger/eval block validation don't do crypto+lsig validation in eval fix sync in backlog executer queue clean up lots of logging to make tests quieter * Fix a bug in Credential.lowestOutput caused by improper domain separation (#716) * Fix a bug in Credential.lowestOutput caused by improper domain separation The bug causes larger accounts to be block proposers more often than should happen based on their fraction of online stake. This patch will cause nodes to vote for a protocol upgrade that fixes the buggy behavior. After the protocol upgrade goes through, all the upgrade-related code in this commit should be removed, as it's not necessary to retain the old buggy behavior for catchup. (For convenience code to be removed is marked with a "TODO(upgrade)" comment.) * Typofix; fix merge issue * Fix test * Add a comment to make the linter happy * Typo fixes * Goal docs tweaks (#731) * test all `goal ... -h` (#730) * test all `goal ... -h` ensures no conflicting subcommand options adds less than 2 seconds to test time * review tweak, rearrange to sub test script * actually pass args * grr, arg * Move EnsureDigest logic into the catchup service (#726) * Move EnsureDigest logic into the catchup service. * update unit tests. * Add unit testing for new catchup feature. * updating per review. * Add handing for concurrently updated round. * Add comment. * typo * Correct the quit semantics. * Faster stringer implementation for Address (#736) * Faster stringer implementation. * Optimize UnmarshalChecksumAddress as well. * Add comment. * Interconnect relays on a locally deployed network (#742) * static codegen for msgpack encode/decode (#578) Implement static code generation for msgpack encoding and decoding of blocks and transactions. The existing functions `protocol.Encode` and `protocol.Decode` invoke the generated encoders and decoders if present. Benchmarking block encode/decode suggests this is about 4x faster than go-codec (which we were using previously). When changing existing data structures to be encoded, or adding new ones, run `make msgp`. Some code is still using go-codec (notably agreement). If we convert all code to use this static code generation plan, we could get rid of the dynamic check and dispatch in `protocol.Encode` and `protocol.Decode`. Having fast encoding/decoding is not only good for performance, but allows us to remove complex optimizations (like caching txid values or encoding lengths, removed in this commit), and might allow us to perform checks that we previously thought would be too expensive (like making sure that an encoding is canonical, by re-encoding). Having explicitly generated code also makes it easier to understand performance and tweak it further. Results from pprof should be much less opaque (no reflection) and more actionable. Explicit codegen also makes it clear when we make a change that affects encoding/decoding of network messages. The code generation is done using a modified version of github.com/tinylib/msgp, forked as github.com/algorand/msgp. * Use cobra for the kmd command to allow for documentation automation. * Limit client side connection rate, part 1 * Draft of the solution * saving current changes * saving current changes * saving current changes * saving current changes * saving current changes * saving current changes * saving current changes * Addressing review comments. * fixing test failure * fixing test failure2 * Adding a unit test * txsync now will go through http request connection limit. * Addressing review comments. Changing phonebookEntries duration type from uint to time.Duration * fixint test failure. * splitting wait for connection time and add connection time. Addressing some review comments. * recording provisional time before connect, updating after. * minor fixes * Embedding MockNetwork in mock structs which implment GossipNode to avoid the implmentation of dummy functions to satisfy the interface. * not embedding by reference. * A few more review comment fixes. * Fix checkdep message. (#745) * Fix equal stake distribution in generated networks (#749) * Use math.big.Rat rational numbers to get rid of summation error * Root cause although in JSON serialization of float64 data type so that some values are rounded and others are not. Correct fix seems to be in using the same accuracy in distribution code and float64 marshaling. * Update with PR feedback. * Change a player test to use either old buggy behavior or new correct behavior depending on ConsensusCurrentVersion. (#748) This allows agreement tests to pass whether ConsensusCurrentVersion is the old V20 or the new V21 * Bugfix: Fix last relevant proposal period in agreement protocol. (#746) When retrieving the last relevant period corresponding to a proposal-value, the proposal store inside the agreement protocol does not properly check that the particular period returned actually matches the passed-in proposal-value. Instead, the proposal store returns the last period seen for *any* proposal-value. When the agreement state machine receives a proposal payload, the proposal store checks whether this payload matches any proposal-value known to be relevant in the current round. If it does, the state machine tells the crypto verifier to verify the new payload. As an optimization, the proposal store in the state machine also tags the payload with the last period in which it is relevant (and whether the matching proposal-value is pinned). The crypto verifier halts concurrent verification of any payload from that period. Separately, the proposal store does not attempt to verify payloads more than once, caching past payloads it has pipelined. For this optimization to be correct, the last relevant period must be correct; otherwise, the network will permanently stall if the following occurs: - In period p, the network observes a best proposal value of v, but it sees neither the payload B corresponding to v nor a threshold of soft-votes for B (seeing such a threshold pins B, preventing the crypto verifier from cancelling). - An attacker is able to see B. - In period p+1, the network attempts to agree on a new proposal value v' corresponding to the payload B'. - After half of the network has received B' but has _not_ finished verifying it, the attacker sends this half the payload B. This half will cancel verification of B' (since it erroneously associates B with period p+1) and will permanently ignore any future broadcasts of B' (which was cached in the proposal store). - If the other half has already staged B', the network will stall permanently, since it will be unable to commit B'. Fixes #710. Thanks to @xixisese for reporting this bug. * Format numbers using number specifier (#735) * Use %d to print numbers, which is abit safer as it prevent potential recursion. * Few more changes to the fuzzer. * Two more updates. * Implement local net template generation with netgoal (#762) * Usage: netgoal generate -n 1 -R 1 -w 100 -o mynettemplate.json -r . -t goalnet goal network create -t mynettemplate.json -r mynet -n mynet * Remove duplicate definitions from netdeploy/networkTemplate * Improve net templates support (#766) * Fix file descriptors leak in 'goal account'. Now goal can import more than maxfiles keys * Fix uint overflow in stake distribution validation. Details: values 10 and -110 were casted to uint and sum up to 100 pct with 32 bits overflow * Allow pct fraction of stake in goal net templates * Fix stake distribution in netgoal.generate: it always produces pcts and not values in algos as was incorrectly thought before * Add tests for netdeploy.Validate() * Release build pipeline step 1: Build, package, sign, deploy to staging (#763) * Reorganize * more restructuring * cleanup * removing test bits * changing upload destination * remove test dir * remove cruft * Moved Jenkinsfile -> jenkinsfile/Build * replace {RSTAMP,FULLVERSION} * fix bugs * remove temp dir location * remove buildnumber.dat * Implement automation for release notes generator (#761) The cicd.yaml config file in this branch can be consumed by our cicd cli to create a draft for release notes for a given version. * back out locking added in c78ada09f230a3c66cd934860700f93ff31a93eb (#764) * back out locking added in c78ada09f230a3c66cd934860700f93ff31a93eb * remove IsFull * bring back txn liveness check. buffer up to all payset groups in chan * no chan close * Implement dummy telemetry hook to safely perform operations on it when telemetry is disabled (#768) * The idea is have telemetry.hook always set. For telemetry disabled case this is a simple noop stub. * Prevents crashes when calling hook.Close/Flush on private networks in case of errors * Remove instances of tagging in our build process (#770) We don't want to be making tags anywhere in our automation. Our release process will take care of that. * Configurable consensus protocol (#750) * Create consensus.json * some changes.. * remove deadcode. * update constant. * Update fixture. * migrate fast upgrade protocols. * move catchup test protocol. * push staged changes. * bugfix. * Remove last test consensus param. * rollback block.go * cleanup : map[protocol.ConsensusVersion]ConsensusParams -> ConsensusProtocols * udpate. * Fix unit test. * Release build pipeline step 2: Test (#773) * Reorganize * more restructuring * begin test stuff * restructure * fix deb test * fix rpm test * fix build * restructure * fix bug * remove temporary feature branch * added new gpg.sh * removed buildnumber.dat * When locally installing, take the binaries from the first-gopath-bin directory. (#776) * Remove temporary build test location (#777) * Make sure to default to Consensus if consensus.json is missing. (#779) * Make util.ExecAndCaptureOutput able to process large output (#771) * In case of large amount of data written to stdout/stderr from the wrapped command the process is blocked until stdout/stderr buffers cleared. * Old implementation waited until cmd return and then read stdout/stderr. * New implementation reads stdout/stderr pipes in goroutines. * Make goal node state change commands systemd aware (#769) * Make goal node state change commands systemd aware I added a property to libgoal/system.go where we can set whether or not our algod process is managed by systemd. * Write expect test for goal node with systemd scenarios This tests that the message from our cli on goal node start, stop and restarts is correct for systemd_managed data_dirs. * Write expect test for goal node start, stop and restart This tests that the message from our cli on goal node start, stop and restarts is correct for data_dirs that are not managed by systemd. * Add systemd_managed: true as a default in system.json Since all linux installs currently use systemd, I added this to the base system.json file. * Restructure release/ dir (#782) * Restructure release/ dir for each build release pipeline stage First step is the `build` pipeline. * More restructuring Removed `release/ci/`. Every dir under `release/` will now be a pipeline. * Added "test" pipeline * update readme * Remove temp location and remove code cruft * removed outdated readme * more cleanup * implement reviewer changes * Allow asset creation transactions to be created while catching up. (#790) * Tunnel outgoing connection via a rate limiting dialer (#780) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Addressing Pavel's comments. * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Addressing Pavel's comments. * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Allow asset creation transactions to be created while catching up. (#790) * Addressing Pavel's comments. * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Addressing Pavel's comments. * rebasing master Co-authored-by: Tsachi Herman <tsachi.herman@algorand.com> Co-authored-by: Will Winder <wwinder.unh@gmail.com> * Release build pipeline step 3: Added "prod" pipeline to `release/` (#788) * Release build pipeline step 3: Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Implement reviewer suggestion * better algons error messages. (#794) * Create a rate limiting transport (#795) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Rate limiting transport. * remove comment. * Unify dialing path. * Removing ForceAttemptHTTP2 which isn't available on go 1.12 Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Some release abstraction (#796) * Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Restructuring * test stage * add common build params * Remove temp github location * Change agreement message encoder to msgp. (#786) * Upgrade to new version of msgp. - omitemptyarray and omitempty are correctly distinguished between in equivocationVoteAuthenticator. - The embedded Block is correctly handled in proposal, unauthenticatedProposal, and transmittedPayload. * Randomize anonymous (embedded) fields when testing codec. Co-authored-by: Nickolai Zeldovich <nickolai@csail.mit.edu> * Move fetcher client into catchup (#774) * changes. * adding dialer. * Move fetcher client into catchup, step 1. ( most unit tests are still broken ) * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * update. * fix few more unit tests. * fix syncer tests. * undo change. * Add a comment. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Fix gpg keygrip code and remove old code (#797) * bugfix : compile correctly teal program that includes a base64 string which starts with double slash (#787) * update. * Improve test. * Add support for multiple network protocol versions (#799) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Add a version-accept header to support multiple network protocol versions. * update. * Remove comments. * Addresing reviewer concerns. * Add a unit test for checkProtocolVersionMatch logic. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Include comment about something that looks like a vulnerability, but isn't. (#820) * Skip logging and telemetry when not needed. (#737) * Added utils for testing release packages (#819) * Added utils for testing release packages check_sig: Verify gpg signatures of build artifacts. test_package: Verifies the packages were built from the correct branch with the correct hash and verifies the test version release number. * Implement reviewer feedback * Update docker build script to be more flexible with its naming (#822) * Deleting out-of-date wallet folder in go-algorand. (#821) * Some build fixes (#818) * Some build fixes Most importantly, move the `fullversion.dat` file to the $HOME directory and use it for the name of the upload directory on s3. It should have been doing this before, but it was copying it to the wrong location on the ec2 instance. * Implement reviewer suggestions * Completely remove temp dir before re-creating it * Move `dsign` functionality to goal (#800) * Deferred persistent crash data validation (#823) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Perform the crash-decoding after responding to the event, so that the new vote won't be blocked. * undo unintended changes. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Update Dockerfile for our official docker image (#826) * fix incorrect comments (#825) * Reduce the log verbosity on scenario 3 deployed network (#828) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Reduce the amount of logs on s3 network. When running s3, our performnace is negatively impacted by high amount of logging. This change reduces the logging to warning and above. * undo Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Trigger test build (#831) * Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Restructuring * test stage * add common build params * fix build * test * Remove build parameters * wip * remove test dir * still trying to fix random build errors * updating test phase * extract build_env values * add trigger for test phase * test * removed test location * More release build fixes (#836) * Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Restructuring * test stage * add common build params * fix build * test * Remove build parameters * wip * remove test dir * still trying to fix random build errors * updating test phase * extract build_env values * add trigger for test phase * derp * remove test location * Split consensus from config (#832) * Split consensus from config. * few more changes. * netgoal: create accounts in parallel (#827) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Make parallel accounts. * undo change. * handle data race. * use atomics. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Updated job name to match on the Jenkins server (#837) * Brice/refactor make (#835) * Refactor makefile I refactored how we build libsodium to support multiple os and cpu architectures from the crypto dir. Also I added some make targets that work the way our ci pipeline needs them to. * Add flags for other linux architectures in crypto/vrf.go * Remove yum commands from configure_dev script I decided we don't need these here. I just left the which apt-get so that this script works the same but doesn't break on centos. * Add multi platform support to cicd yaml Now we have stages to do builds on different platforms utilizing docker and qemu cpu virtualization. * Refactor libsodium dep management Before the libsodium dep paths were hardcoded under cgo tags, now they're being passed in through env vars. Also throwing in a dockerfile for our cicd process. * Revert change to configure_dev.sh These changes actually aren't necessary since our build process doesn't use this script. * Switch back to using cgo tags for CFLAGS and LDFLAGS This way LDFLAGS aren't used all over the place unecessarily which could cause problems in the future. * Fix names of things in Makefile Fixed the name of crypto/lib/libsodium.a to crypto/libs/$(OS_TYPE)/$(ARCH)/lib/libsodium.a so that it reflects the updated project structure. Also changed VARIATIONS=literally_anything in ci-build to VARIATIONS=$(OS_TYPE)/$(ARCH) so that it looks like it's useful. * Update cicd.yaml to use the new shell.docker.Ensure task This task makes sure that the docker image(s) our tasks depend on are avaiable during stage executions. It either pulls the docker image or builds it from scratch when it's not available. * Fix references to crypto/lib/libsodium.a make target A travis script was referencing this directly so I fixed the target. Also, I removed an unnecessary reference in our rpm build script. * Remove ci-deps from docker build make targets Those were there by mistake, and having them kind of defeated the purpose packing those deps with the images. Also I moved ci-deps to the shell.Make target in build-local since those are necessary there. * Run build and test jobs in a docker container (#840) * Brice/fix deploy linux (#767) * Make dockerignore file This file will prevent docker build contexts from loading certain files when creating docker build contexts. I just made it a copy of .gitignore since those files don't seem to be necessary for any current Dockerfile for go-algorand. * Fix unnecessary cd into parent directory of project root This was causing huge docker contexts for no apparent reason. * Change dockerignore to include some necessary files I switched tmp to tmp/dev_pkg and tmp/out to ignore large folders that seem unnecessary for any docker build today and removed ignores for the network gen files * Limit msgp tool warning message scope (#834) * Try to reduce msgp verbosity. * update * update msgp version in go.mod * update go.sum * Remove old entries from go.sum * Refactoring peer unicast implementation (#841) * Adding topics type and test for marshall/unmarshall of the topics. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * Adding topics type and test for marshall/unmarshall of the topics. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * removing reader, separating marshall from hash. * checking in current draft. * complete the test * Adding topics type and test for marshall/unmarshall of the topics. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * removing reader, separating marshall from hash. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * checking in current draft. * complete the test * some cleanup * fixes, lint, format. * Addressing Tsachi's comments * Addressing Tsachi's comments. getNonce() fixed, and a new test added for it. * Addressing few more comments. * Unifying getResponseChannel and removeResponseChaneel * addressing Pavel's comment: correcting a comment. * Actively scan for ledgers, normalize names cross platform (#842) Make ledger wallet names more canonical, check that sending a command doesn't return an error, only run active ledger for new devices. * require Encode() and Decode() to take msgp'ed types (#798) This ensures that calls to protocol.Encode() and protocol.Decode() are going to hit fast msgp-generated encoders and decoders. There are some places where we can't use msgp-generated code yet, for whatever reason, and those still invoke the reflection-based go-codec, using protocol.EncodeRefect() and protocol.DecodeReflect(). The main intent of this commit is to clearly identify places where we still invoke go-codec, and fix some trivial cases (like passing a struct to protocol.Encode by value instead of by pointer). Later on, we can go through the calls to protocol.EncodeReflect() and protocol.DecodeReflect() to see if we can get rid of the harder cases, to reduce or eliminate the use of go-codec altogether. * Change EnsureDigest to be asynchronous. (#754) This allows nodes which have received a threshold of cert-votes but not the corresponding block to continue to relay messages as normal. This prevents nodes in this state from inadvertently partitioning the network, which can cause stalls in very rare cases. - certThresholds now stage values in the proposal hierarchy, and essentially act like softThresholds (for the event.period) - Note: we can receive certThresholds for the previous period (but not softs, which aren't the freshest bundle). So now we can stage a value for the previous period, which is a side effect. - certThresholds fast forward periods and prevents subsequent period changes in the current round. - Do not cancel cryptographic verification of cert-bundles from old periods and continue to relay them. - Adds stageDigestAction, distinct from ensureAction, to signal the ledger that it should attempt to fetch the block given a certificate. It is not a blocking operation. - certThreshold without payloads now trigger stageDigestAction - If we receive a payload, check if cert is freshest bundle; if so, finish round. Co-authored-by: ben <me@vervious.com> * Strip any defined remote repo from branch name when building (#850) When using a wildcard (*) character to watch multiple branches when polling in Jenkins, the GIT_BRANCH environment variable will be "origin/rel/beta" instead of just "rel/beta". This breaks our tooling, but a simple fix is this util which simply strips any matched remote repo from the env var string value. * Implement DNSSEC resolving library (#830) * Implement DNSSEC resolving library * A, AAAA, SRV, CNAME lookup with sig verification * Recursive ip address lookup from CNAME with sig verification * Cached trust chain that is updated on DNSKEY cached sig expiration or zone signing key (ZSK) miss needed for end-user request's sig verification or DS-record confirmation on the chain update * Test harness includes a mock NS implementation for DNS-aware NS server * Closes #251 RFCs used: 1. DNS https://tools.ietf.org/html/rfc1035 2. DNS clarifications https://tools.ietf.org/html/rfc2181 3. DNSSEC proto change https://tools.ietf.org/html/rfc4035 4. DNSSEC RR change https://tools.ietf.org/html/rfc4034 5. DNSSEC clarifications https://tools.ietf.org/html/rfc6840 6. DNSSEC keys management https://tools.ietf.org/html/rfc6781 7. DNS SRV https://tools.ietf.org/html/rfc2782 * Utility to check relays' DNSSEC support * Make DNSSEC resolver interface compatible with net.Resolver * Use context * Change LookupCNAME: fail only if no A/AAA record, do not fail if no CNAME * Change LookupSRV: sort records by priority and randomize by weight * Change LookupIPAddr: always make recursive lookup * Implement missed functions like LookupTXT * Use DNSSEC for SRV retrieval * Make DNSSEC thread safe * Add deadlock.Mutex to protect cached trust chain * Always use a new instance of dns.Client to work around a race in ExchangeContext * Address review comments * Get rid of pointers to arrays * Add time param to verify* and makeTrustedZone functions to make tests against real DNSKEY/RRSIG snapshot robust * Rewrite UDP/TCP retries * Renames * Disable failed attempts to retrieve SRV in agreement gossip tests * Implement DNSSecurityFlags config variable * New config version and migration * Implement DNSSEC-aware DialContext * Closes #253 * Implement LookupTLSA * Tests for LookupTXT, NS, MX, TLSA * Minor comments and code fixes * Code review fixes * disable the concurrent wallet generation. (#848) * Force docker to use `root` as the user when running the instance (#849) By default, docker will use the root user, but the jenkins pipeline docker plugin inexplicitly runs the instance under the permissions of the user that launched the script that contains the docker command. * Improve some error checking and logging for build process (#851) * Fix comment in agreement. (#856) * Add MoI to network (#853) * Implement message of interest * Add missing file. * Make the ping handler optional. * fix typo. * Improve unit testing. * update return variable name, * Add comment. * Better error case handling in database utils (#857) * Fix few error handling edge cases * Fix bug in setupAgreementWithValidator * Better fix. * Explicitly curl go.1.12.9 and archive `get_latest_go.py` (#855) The golang download page was changed and our pinned version of golang is no longer referenced on it. This was breaking our build. Instead, for now we'll explicitly download the tarball via `curl`. https://golang.org/dl/?mode=json * Trap errors and remove ec2 instance (#854) Add error handling for the release build pipeline. * Update the update script. (#670) * Faster external_build_printlog by using curl instead of aws cli (#847) * Fix concurrent SQLite initialization (#872) * SQLite init is not thread safe and mattn/go-sqlite3 does not care * When open any db first time do it synchronously in order to make a nested sqlite3_initialize() the first call non-concurrently * Re-enable mutli-threaded account generation * Closes #846 * change _tx_lock -> _txlock (#871) * Redirect stdout of build log file to build release upload directory (#873) * Install boto3 as a build dependency for docker (#875) * Enable some skipped test on MacOS (#876) * Asset tests * Rest client test * Send-Receive test (TestAccountsCanSendMoney) - takes 16 minutes * Set root as explicit docker user for test phase (#874) * Refactor are combine the phonebook implementations (#870) Merge the three phonebooks implementations into one. * Adding a verifying signatures step to the build release pipeline (#878) * Wrap entire arguments in quotes Co-authored-by: Tsachi Herman <tsachi.herman@algorand.com> Co-authored-by: pzbitskiy <pavel@algorand.com> Co-authored-by: Derek Leung <derek@algorand.com> Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> Co-authored-by: algobolson <45948765+algobolson@users.noreply.github.com> Co-authored-by: Rotem Hemo <rotem@algorand.com> Co-authored-by: Will Winder <wwinder.unh@gmail.com> Co-authored-by: Max Justicz <max@justi.cz> Co-authored-by: algoradam <37638838+algoradam@users.noreply.github.com> Co-authored-by: Evan Richard <EvanJRichard@users.noreply.github.com> Co-authored-by: Nickolai Zeldovich <nickolai@csail.mit.edu> Co-authored-by: bricerisingalgorand <60147418+bricerisingalgorand@users.noreply.github.com> Co-authored-by: Shumo Chu <stechu@users.noreply.github.com> Co-authored-by: ben <me@vervious.com>
btoll
pushed a commit
to btoll/go-algorand
that referenced
this pull request
Apr 8, 2020
btoll
pushed a commit
to btoll/go-algorand
that referenced
this pull request
Apr 8, 2020
* shellchecked `build_deb.sh` * Test pre-packaged executable on variety of linux platforms (#651) * Add platform testing using docker for generated binaries. * Fix path. * Apply reviewer's requested changes. * Reduce e2e_go_tests execution time twice (#645) There are seven major contributors to integration tests running time TestOnlineOfflineRewards (1248.64s) TestAssetConfig (364.71s) TestRewardRateRecalculation (226.78s) TestStartAndEndAuctionTenUsersOneBidEach (196.34s) TestNoDepositAssociatedWithBid (189.74s) TestDeadbeatBid (188.70s) TestStartAndCancelAuctionNoBids (183.35s) This commit considers only first three. 1. Fixing rewards interval in config for TestRewardRateRecalculation from 25 to 10 reduces time twice: TestRewardRateRecalculation (119.34s) 2. Fixing initialRound in TestOnlineOfflineRewards test from 301 to 11 reduces time 15 times: TestOnlineOfflineRewards (73.80s) 3. TestAssetConfig looks long by design - commits and waits max allowed assets 4. Address TODO in run_integration_tests.sh. Now e2e_client_runner calls 'goal network delete' to reflect this removal Refers #508 * Promote test_release.sh so that it won't conflict with release testing. (#655) * Fix concurrent access to wallet handles cache in goal (#654) * Fix concurrent access to wallet handles cache in goal * In rare cases (i.e. e2e tests run in parallel on the same network) a race cond happens when accessing goal.cache/walletHandles.json file * Introduce advisory locking on the mentioned file * Implementation is extendable by implementing *locker* interface for specific platform and providing a new *newLockedFile* constructor. * Address PR review notes * Do no truncate before obtaining the lock * Increase waiting interval to 10 ms * Simplify newLockedFile constructor * Allow upgrades to specify the delay before their execution. (#650) This replaces UpgradeWaitRounds with MinUpgradeWaitRounds and MaxUpgradeWaitRounds. Proposers specify an upgrade's delay given their own ApprovedUpgrades, encoding the proposed delay in the UpgradeVote. Verifiers check that the delay sits between MinUpgradeWaitRounds and MaxUpgradeWaitRounds (inclusive). This commit adds this functionality but does not change current behavior. * Set explicit 30 sec timeout for AlgorandGoal::RawSend in expect test (#658) * Should help with sporadic failures when we send and TEAL in groups * Support variable-delay protocol upgrades in ConsensusFuture. (#659) Also add some unit tests for variable-delay protocol upgrades. * Shant/catchup stop on unapproved (#660) * A fix for arm64 failures One observation from the failures is that the test timeouts could be the cause of the failure. Expect scripts when called from go test using CombinedOutput is behaving strange (slow). Replacing CombinedOutput with Run. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. Fixing errors and adding comments. * Fixing merge and comment. * added comment * Stop catchup on unapproved protocol round Catchup to stop before fetching the next round if the round protocol is not approved by the node * Some fixex. Review comments from Tsachi. * File accidentally added here. removing. * Reverting changes mistakenly added to this branch. * Adding comment changes. * Partially working test * Adding test to catchup stop on unsupported block Using s.cancel we are droppng the last block. * More tests and development to the catchup service * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. Addressing Tsachi's review comments. * Combine condition blocks * Fixing an error in the log info statement. * Compile linux/amd64 binaries with static linking (#625) * Test static compilation. * remove -fPIC * Try with ubuntu 18.04, since it has newer GCC. * exclude buildmode from test builds. * Fixed missed buildmode. * Refactor. * Add logging for the telemetry server connections (#661) * Add logging for the telemetry server connections. * Revert unintended change. * Improve error message. * add bool support to algocfg (#667) e.g. `algocfg set -p EnableProcessBlockStats -v true` * Reduce execution time of expect tests (#665) * CombinedOutput blocks on copying empty stderr stream from expect that causes at least 60 sec timeout for most of the tests * This implementation uses a temp time for stderr accumulation. In this case exec.Cmd does not run goroutines for reading child's actual stderr. * 655 sec (before) vs 205 sec (after) * Avoid upgrading boost on travis Mac builds (#669) * specify a boost version for the mac build. * try to prevent boost update on travis mac builds. * Abort algod startup if logging.config file has bad permissions (#662) * This should prevent telemetry event loses on systems with invalid permissions on ~/.algorand/logging.config file * Another possible workaround is to relax default config path mask in **cmd/goal/commands.go:ensureCacheDir** from 700 to 744. This is not implemented because of possible security risk. * Add error logging for getting a cached wallet handle (#663) Needed to debug 'Couldn't read password: inappropriate ioctl for device' error message in tests * Update license date 2019 -> 2020 (#674) * Change 2019 -> 2020 * Update readme. * Update copyright to use date range. (#676) * Tee existing tests so we can review output before piping it forward. (#677) * Make gracefull exit of a node that is waiting for WaitForBlock call (#679) * Make gracefull exit of a node that is waiting for WaitForBlock call. * Add comment. * Remove tput where not supported by terminal (#682) * Remove tput where not supported by terminal. * send tput errors to dev/null * Fix bad constants. * Avoid waiting for block that won't be reached due to unsupported protocol upgrade. (#681) * Fix - Indexer now shows received transactions (#684) -- Adding receiver function to transaction that returns the receiver of a transaction -- Fix indexer to show received transactions * Undo teeing to dev/tty as it doesn't work well in terminal free environments. (#689) * Improve lockFile error handling (#687) * Better lockFile error handling. * Make blocking locker. * Fix F_OFD_GETLK constant. * bugfix. * Try platform specific code. * use unix package to include F_OFD_SETLKW * remove unused imports. * Rename files. * Catchup service stop on unsupported and e2e test (#685) * A fix for arm64 failures One observation from the failures is that the test timeouts could be the cause of the failure. Expect scripts when called from go test using CombinedOutput is behaving strange (slow). Replacing CombinedOutput with Run. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. Fixing errors and adding comments. * Fixing merge and comment. * added comment * Stop catchup on unapproved protocol round Catchup to stop before fetching the next round if the round protocol is not approved by the node * Some fixex. Review comments from Tsachi. * File accidentally added here. removing. * Reverting changes mistakenly added to this branch. * Adding comment changes. * Partially working test * Adding test to catchup stop on unsupported block Using s.cancel we are droppng the last block. * More tests and development to the catchup service * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. Addressing Tsachi's review comments. * Combine condition blocks * Fixing an error in the log info statement. * Draft: Test for upgrading a node while keeping another node not upgradable goal node status field for informing if the node is upgradable * Catchup service stop on unsupported, ode status message, and e2e test In this change: Updated catchup service to stop on unsupported and not unupgradable. Updated goal node status to inform when the catchup service is stopped. Updated goal node status by removing last synced information. Added e2e test for stopped catchup service on unsupported protocol. * Separating goal changes from this PR. Separating goal changes from this PR. goal changes are in PR: https://github.com/algorand/go-algorand/pull/686 * review comment: use NotEqual instead of True * Make ARM64 build mandatory. (#694) * Updates to the goal node status (#686) * Updates to the goal node status This change is splitting the goal section from PR: https://github.com/algorand/go-algorand/pull/685 Updated goal node status to inform when the catchup service is stopped. Updated goal node status by removing "Synced Since Startup" field. * Adding parameter StoppedAtUnsupportedRound to v1.NodeStatus and node.StatusReport * Adding check to libgoal Client StoppedAtUnsupportedRound in v1.NodeStatus true and false values. * Review comments from Tsachi: using the timeout in select * Updating the test to reflect the removal of: has synced since startup. * telemetry recorded locally as info log (#666) config.json: {"TelemetryToLog":true} logging.config: {"Enable":false,"SendToLog":true} * Relax StartNetwork regex (#696) * relax StartNetwork regex. * Another attempt. * Two fixes to basicCatchup_test: cloned node not stopped and env var conflict (#697) * Updates to the goal node status This change is splitting the goal section from PR: https://github.com/algorand/go-algorand/pull/685 Updated goal node status to inform when the catchup service is stopped. Updated goal node status by removing "Synced Since Startup" field. * Adding parameter StoppedAtUnsupportedRound to v1.NodeStatus and node.StatusReport * Adding check to libgoal Client StoppedAtUnsupportedRound in v1.NodeStatus true and false values. * Review comments from Tsachi: using the timeout in select * Two fixes to basicCatchup_test: cloned node not terminated and env var collision 1) TestBasicCatchup and newly added TestStoppedCatchupOnUnsupported create a new node by cloning one of the network nodes. When fixture.Shutdown() stops the original network nodes, leaves the cloned node running. This change adds function shutDownClonedNode to stop the cloned nodes. 2) In TestStoppedCatchupOnUnsupported, an env variable is used to delete ConsensusCurrentVersion, so that the cloned node behaves as if its binary does not support the consensus version. However, when the TestBasicCatchup runs in parallel, it also picks up the env variable, and consequently deletes ConsensusCurrentVersion from the Consensus map. When this happens, TestBasicCatchup sporadically fails. In this change, instead of having ConsensusTestUnupgradedProtocol upgrade to ConsensusCurrentVersion, or deleting ConsensusCurrentVersion so it cannot be upgraded, it sets up ConsensusTestUnupgradedProtocol to upgrade to ConsensusTestUnupgradedToProtocol. Hence, the env variable is used to delete ConsensusTestUnupgradedToProtocol. This way the conflict with other tests is eliminated. * Fixing golint by addint comment. * Tsachi's review comment: unsetting the env var. * Make scripts executable. (#702) * More reliable fetcher unit tests. (#708) * Avoid starting the Telemetry service when logging is disabled (#703) if remote telemetry is not enabled, do not start uri update service add a nil check * Shutdown kmd when test fixture is going down. (#709) * Fix unit test. (#711) * Execute e2e tests one at a time on arm64 (#701) * Test changes. * Better error reporting on goalFixture * Add version query for kmd startup. * Few more test cases to cover. * try to wait. * changes * Update. * Move KMD shutdown to network. * Add some debug messages to figure out what's going on. * Fix script bug. * Fix proper KMD shutdown via the KMDFixture * Run the tests one at a time only on arm64 * Updating according to review. * Disable pprof endpoints by default (#693) * enable go profiler for netdeploy * add EnableProfiler to ConfigJSONOverride * Update the makefile to skip the static linking when compiling on centos. (#713) * Fail e2e-go tests when node panics (#699) * Fail test on panic * few more touchups. * sync * bugfix. * Update few more usecases. * Refactoring * Simplify. * undo network referencing. * undo few func-ptr. * undo some more stuff. * Update method names * Few more touchups. * Build release job (#698) * Initial commit * Added Jenkinsfile * Updated Jenkinsfile * Works until GPG IPC * Move build files into new release/ dir Also, renamed files {build_,}release.sh and {build_,}setup.sh * Path issues * Use t2.xlarge instance type (4 vCPUs, 16GB ram) * Restructuring * shellchecked * fix bug * Added new `socket.sh` file * Trying to build rpm * Bump up disk size of ec2 instance * more attempts to make rpm * more fixes * move /stuff -> /root/stuff * wip * moved to correct paths * Have `release` have its own start and kill ec2 instance scripts * use buildhost scripts after all * Make sure the gpg key name matches!!!!! -%_gpg_name Algorand RPM <rpm@algorand.com> +%_gpg_name rpm algorand <rpm@algorand.com> * fixes * Add upload stage to pipeline * Add tag stage to pipeline * more fixes * Move start/stop ec2 instance scripts back into release/ * Add ability to dynamically set branch * Added controller/ subdir * Some cleanup * Adding tag support Moved `Jenkinsfile` into controller/ subdir. * Move build_env build.sh -> setup.sh Moved socket.sh -> controller/socket.sh * Revert buildhost changes * some cleanup * fix build * test packages locally * upload packages to s3 test bucket * restructure * misc * fix build * Add Jenkins parameters * fix build * Move commands into Jenkinsfile into stages/ * fix build * Make test stage more explicit * fix build * Implementing reviewer suggestions * Added debug info * fix build * Merge into master * implement reviewer suggestions * turn off test stage * fix build * fix build * fix build * Update readme * removed unneeded archive/ dir * Use service-wide logger instead of logging.Base() in agreement (#714) * Switch from default logger to pre-configured logger in some components of agreement service * Mark some of the slow e2e tests as such (#719) * Mark some of the slow e2e tests as such. * Move shorttest flag to be set at top level. * Wait test less restrictive. (#718) * Move slow test to get executed on nightly builds (#721) * Move some more test to be "slow tests", and modify short test condition so that we will run the long tests on nightly builds only. * Fix elif -> else * Faster upgrade tests. (#722) * Disable failing test. (#724) * Generate docs for algokey. * s/goal/algokey * Improve algons error logging (#733) * Write body when erroring on SRV/DNS records update. * Few more error messages. * ledger/eval refactor (#700) refactor ledger/eval block validation don't do crypto+lsig validation in eval fix sync in backlog executer queue clean up lots of logging to make tests quieter * Fix a bug in Credential.lowestOutput caused by improper domain separation (#716) * Fix a bug in Credential.lowestOutput caused by improper domain separation The bug causes larger accounts to be block proposers more often than should happen based on their fraction of online stake. This patch will cause nodes to vote for a protocol upgrade that fixes the buggy behavior. After the protocol upgrade goes through, all the upgrade-related code in this commit should be removed, as it's not necessary to retain the old buggy behavior for catchup. (For convenience code to be removed is marked with a "TODO(upgrade)" comment.) * Typofix; fix merge issue * Fix test * Add a comment to make the linter happy * Typo fixes * Goal docs tweaks (#731) * test all `goal ... -h` (#730) * test all `goal ... -h` ensures no conflicting subcommand options adds less than 2 seconds to test time * review tweak, rearrange to sub test script * actually pass args * grr, arg * Move EnsureDigest logic into the catchup service (#726) * Move EnsureDigest logic into the catchup service. * update unit tests. * Add unit testing for new catchup feature. * updating per review. * Add handing for concurrently updated round. * Add comment. * typo * Correct the quit semantics. * Faster stringer implementation for Address (#736) * Faster stringer implementation. * Optimize UnmarshalChecksumAddress as well. * Add comment. * Interconnect relays on a locally deployed network (#742) * static codegen for msgpack encode/decode (#578) Implement static code generation for msgpack encoding and decoding of blocks and transactions. The existing functions `protocol.Encode` and `protocol.Decode` invoke the generated encoders and decoders if present. Benchmarking block encode/decode suggests this is about 4x faster than go-codec (which we were using previously). When changing existing data structures to be encoded, or adding new ones, run `make msgp`. Some code is still using go-codec (notably agreement). If we convert all code to use this static code generation plan, we could get rid of the dynamic check and dispatch in `protocol.Encode` and `protocol.Decode`. Having fast encoding/decoding is not only good for performance, but allows us to remove complex optimizations (like caching txid values or encoding lengths, removed in this commit), and might allow us to perform checks that we previously thought would be too expensive (like making sure that an encoding is canonical, by re-encoding). Having explicitly generated code also makes it easier to understand performance and tweak it further. Results from pprof should be much less opaque (no reflection) and more actionable. Explicit codegen also makes it clear when we make a change that affects encoding/decoding of network messages. The code generation is done using a modified version of github.com/tinylib/msgp, forked as github.com/algorand/msgp. * Use cobra for the kmd command to allow for documentation automation. * Limit client side connection rate, part 1 * Draft of the solution * saving current changes * saving current changes * saving current changes * saving current changes * saving current changes * saving current changes * saving current changes * Addressing review comments. * fixing test failure * fixing test failure2 * Adding a unit test * txsync now will go through http request connection limit. * Addressing review comments. Changing phonebookEntries duration type from uint to time.Duration * fixint test failure. * splitting wait for connection time and add connection time. Addressing some review comments. * recording provisional time before connect, updating after. * minor fixes * Embedding MockNetwork in mock structs which implment GossipNode to avoid the implmentation of dummy functions to satisfy the interface. * not embedding by reference. * A few more review comment fixes. * Fix checkdep message. (#745) * Fix equal stake distribution in generated networks (#749) * Use math.big.Rat rational numbers to get rid of summation error * Root cause although in JSON serialization of float64 data type so that some values are rounded and others are not. Correct fix seems to be in using the same accuracy in distribution code and float64 marshaling. * Update with PR feedback. * Change a player test to use either old buggy behavior or new correct behavior depending on ConsensusCurrentVersion. (#748) This allows agreement tests to pass whether ConsensusCurrentVersion is the old V20 or the new V21 * Bugfix: Fix last relevant proposal period in agreement protocol. (#746) When retrieving the last relevant period corresponding to a proposal-value, the proposal store inside the agreement protocol does not properly check that the particular period returned actually matches the passed-in proposal-value. Instead, the proposal store returns the last period seen for *any* proposal-value. When the agreement state machine receives a proposal payload, the proposal store checks whether this payload matches any proposal-value known to be relevant in the current round. If it does, the state machine tells the crypto verifier to verify the new payload. As an optimization, the proposal store in the state machine also tags the payload with the last period in which it is relevant (and whether the matching proposal-value is pinned). The crypto verifier halts concurrent verification of any payload from that period. Separately, the proposal store does not attempt to verify payloads more than once, caching past payloads it has pipelined. For this optimization to be correct, the last relevant period must be correct; otherwise, the network will permanently stall if the following occurs: - In period p, the network observes a best proposal value of v, but it sees neither the payload B corresponding to v nor a threshold of soft-votes for B (seeing such a threshold pins B, preventing the crypto verifier from cancelling). - An attacker is able to see B. - In period p+1, the network attempts to agree on a new proposal value v' corresponding to the payload B'. - After half of the network has received B' but has _not_ finished verifying it, the attacker sends this half the payload B. This half will cancel verification of B' (since it erroneously associates B with period p+1) and will permanently ignore any future broadcasts of B' (which was cached in the proposal store). - If the other half has already staged B', the network will stall permanently, since it will be unable to commit B'. Fixes #710. Thanks to @xixisese for reporting this bug. * Format numbers using number specifier (#735) * Use %d to print numbers, which is abit safer as it prevent potential recursion. * Few more changes to the fuzzer. * Two more updates. * Implement local net template generation with netgoal (#762) * Usage: netgoal generate -n 1 -R 1 -w 100 -o mynettemplate.json -r . -t goalnet goal network create -t mynettemplate.json -r mynet -n mynet * Remove duplicate definitions from netdeploy/networkTemplate * Improve net templates support (#766) * Fix file descriptors leak in 'goal account'. Now goal can import more than maxfiles keys * Fix uint overflow in stake distribution validation. Details: values 10 and -110 were casted to uint and sum up to 100 pct with 32 bits overflow * Allow pct fraction of stake in goal net templates * Fix stake distribution in netgoal.generate: it always produces pcts and not values in algos as was incorrectly thought before * Add tests for netdeploy.Validate() * Release build pipeline step 1: Build, package, sign, deploy to staging (#763) * Reorganize * more restructuring * cleanup * removing test bits * changing upload destination * remove test dir * remove cruft * Moved Jenkinsfile -> jenkinsfile/Build * replace {RSTAMP,FULLVERSION} * fix bugs * remove temp dir location * remove buildnumber.dat * Implement automation for release notes generator (#761) The cicd.yaml config file in this branch can be consumed by our cicd cli to create a draft for release notes for a given version. * back out locking added in c78ada09f230a3c66cd934860700f93ff31a93eb (#764) * back out locking added in c78ada09f230a3c66cd934860700f93ff31a93eb * remove IsFull * bring back txn liveness check. buffer up to all payset groups in chan * no chan close * Implement dummy telemetry hook to safely perform operations on it when telemetry is disabled (#768) * The idea is have telemetry.hook always set. For telemetry disabled case this is a simple noop stub. * Prevents crashes when calling hook.Close/Flush on private networks in case of errors * Remove instances of tagging in our build process (#770) We don't want to be making tags anywhere in our automation. Our release process will take care of that. * Configurable consensus protocol (#750) * Create consensus.json * some changes.. * remove deadcode. * update constant. * Update fixture. * migrate fast upgrade protocols. * move catchup test protocol. * push staged changes. * bugfix. * Remove last test consensus param. * rollback block.go * cleanup : map[protocol.ConsensusVersion]ConsensusParams -> ConsensusProtocols * udpate. * Fix unit test. * Release build pipeline step 2: Test (#773) * Reorganize * more restructuring * begin test stuff * restructure * fix deb test * fix rpm test * fix build * restructure * fix bug * remove temporary feature branch * added new gpg.sh * removed buildnumber.dat * When locally installing, take the binaries from the first-gopath-bin directory. (#776) * Remove temporary build test location (#777) * Make sure to default to Consensus if consensus.json is missing. (#779) * Make util.ExecAndCaptureOutput able to process large output (#771) * In case of large amount of data written to stdout/stderr from the wrapped command the process is blocked until stdout/stderr buffers cleared. * Old implementation waited until cmd return and then read stdout/stderr. * New implementation reads stdout/stderr pipes in goroutines. * Make goal node state change commands systemd aware (#769) * Make goal node state change commands systemd aware I added a property to libgoal/system.go where we can set whether or not our algod process is managed by systemd. * Write expect test for goal node with systemd scenarios This tests that the message from our cli on goal node start, stop and restarts is correct for systemd_managed data_dirs. * Write expect test for goal node start, stop and restart This tests that the message from our cli on goal node start, stop and restarts is correct for data_dirs that are not managed by systemd. * Add systemd_managed: true as a default in system.json Since all linux installs currently use systemd, I added this to the base system.json file. * Restructure release/ dir (#782) * Restructure release/ dir for each build release pipeline stage First step is the `build` pipeline. * More restructuring Removed `release/ci/`. Every dir under `release/` will now be a pipeline. * Added "test" pipeline * update readme * Remove temp location and remove code cruft * removed outdated readme * more cleanup * implement reviewer changes * Allow asset creation transactions to be created while catching up. (#790) * Tunnel outgoing connection via a rate limiting dialer (#780) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Addressing Pavel's comments. * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Addressing Pavel's comments. * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Allow asset creation transactions to be created while catching up. (#790) * Addressing Pavel's comments. * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Addressing Pavel's comments. * rebasing master Co-authored-by: Tsachi Herman <tsachi.herman@algorand.com> Co-authored-by: Will Winder <wwinder.unh@gmail.com> * Release build pipeline step 3: Added "prod" pipeline to `release/` (#788) * Release build pipeline step 3: Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Implement reviewer suggestion * better algons error messages. (#794) * Create a rate limiting transport (#795) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Rate limiting transport. * remove comment. * Unify dialing path. * Removing ForceAttemptHTTP2 which isn't available on go 1.12 Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Some release abstraction (#796) * Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Restructuring * test stage * add common build params * Remove temp github location * Change agreement message encoder to msgp. (#786) * Upgrade to new version of msgp. - omitemptyarray and omitempty are correctly distinguished between in equivocationVoteAuthenticator. - The embedded Block is correctly handled in proposal, unauthenticatedProposal, and transmittedPayload. * Randomize anonymous (embedded) fields when testing codec. Co-authored-by: Nickolai Zeldovich <nickolai@csail.mit.edu> * Move fetcher client into catchup (#774) * changes. * adding dialer. * Move fetcher client into catchup, step 1. ( most unit tests are still broken ) * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * update. * fix few more unit tests. * fix syncer tests. * undo change. * Add a comment. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Fix gpg keygrip code and remove old code (#797) * bugfix : compile correctly teal program that includes a base64 string which starts with double slash (#787) * update. * Improve test. * Add support for multiple network protocol versions (#799) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Add a version-accept header to support multiple network protocol versions. * update. * Remove comments. * Addresing reviewer concerns. * Add a unit test for checkProtocolVersionMatch logic. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Include comment about something that looks like a vulnerability, but isn't. (#820) * Skip logging and telemetry when not needed. (#737) * Added utils for testing release packages (#819) * Added utils for testing release packages check_sig: Verify gpg signatures of build artifacts. test_package: Verifies the packages were built from the correct branch with the correct hash and verifies the test version release number. * Implement reviewer feedback * Update docker build script to be more flexible with its naming (#822) * Deleting out-of-date wallet folder in go-algorand. (#821) * Some build fixes (#818) * Some build fixes Most importantly, move the `fullversion.dat` file to the $HOME directory and use it for the name of the upload directory on s3. It should have been doing this before, but it was copying it to the wrong location on the ec2 instance. * Implement reviewer suggestions * Completely remove temp dir before re-creating it * Move `dsign` functionality to goal (#800) * Deferred persistent crash data validation (#823) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Perform the crash-decoding after responding to the event, so that the new vote won't be blocked. * undo unintended changes. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Update Dockerfile for our official docker image (#826) * fix incorrect comments (#825) * Reduce the log verbosity on scenario 3 deployed network (#828) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Reduce the amount of logs on s3 network. When running s3, our performnace is negatively impacted by high amount of logging. This change reduces the logging to warning and above. * undo Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Trigger test build (#831) * Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Restructuring * test stage * add common build params * fix build * test * Remove build parameters * wip * remove test dir * still trying to fix random build errors * updating test phase * extract build_env values * add trigger for test phase * test * removed test location * More release build fixes (#836) * Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Restructuring * test stage * add common build params * fix build * test * Remove build parameters * wip * remove test dir * still trying to fix random build errors * updating test phase * extract build_env values * add trigger for test phase * derp * remove test location * Split consensus from config (#832) * Split consensus from config. * few more changes. * netgoal: create accounts in parallel (#827) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Make parallel accounts. * undo change. * handle data race. * use atomics. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Updated job name to match on the Jenkins server (#837) * Brice/refactor make (#835) * Refactor makefile I refactored how we build libsodium to support multiple os and cpu architectures from the crypto dir. Also I added some make targets that work the way our ci pipeline needs them to. * Add flags for other linux architectures in crypto/vrf.go * Remove yum commands from configure_dev script I decided we don't need these here. I just left the which apt-get so that this script works the same but doesn't break on centos. * Add multi platform support to cicd yaml Now we have stages to do builds on different platforms utilizing docker and qemu cpu virtualization. * Refactor libsodium dep management Before the libsodium dep paths were hardcoded under cgo tags, now they're being passed in through env vars. Also throwing in a dockerfile for our cicd process. * Revert change to configure_dev.sh These changes actually aren't necessary since our build process doesn't use this script. * Switch back to using cgo tags for CFLAGS and LDFLAGS This way LDFLAGS aren't used all over the place unecessarily which could cause problems in the future. * Fix names of things in Makefile Fixed the name of crypto/lib/libsodium.a to crypto/libs/$(OS_TYPE)/$(ARCH)/lib/libsodium.a so that it reflects the updated project structure. Also changed VARIATIONS=literally_anything in ci-build to VARIATIONS=$(OS_TYPE)/$(ARCH) so that it looks like it's useful. * Update cicd.yaml to use the new shell.docker.Ensure task This task makes sure that the docker image(s) our tasks depend on are avaiable during stage executions. It either pulls the docker image or builds it from scratch when it's not available. * Fix references to crypto/lib/libsodium.a make target A travis script was referencing this directly so I fixed the target. Also, I removed an unnecessary reference in our rpm build script. * Remove ci-deps from docker build make targets Those were there by mistake, and having them kind of defeated the purpose packing those deps with the images. Also I moved ci-deps to the shell.Make target in build-local since those are necessary there. * Run build and test jobs in a docker container (#840) * Brice/fix deploy linux (#767) * Make dockerignore file This file will prevent docker build contexts from loading certain files when creating docker build contexts. I just made it a copy of .gitignore since those files don't seem to be necessary for any current Dockerfile for go-algorand. * Fix unnecessary cd into parent directory of project root This was causing huge docker contexts for no apparent reason. * Change dockerignore to include some necessary files I switched tmp to tmp/dev_pkg and tmp/out to ignore large folders that seem unnecessary for any docker build today and removed ignores for the network gen files * Limit msgp tool warning message scope (#834) * Try to reduce msgp verbosity. * update * update msgp version in go.mod * update go.sum * Remove old entries from go.sum * Refactoring peer unicast implementation (#841) * Adding topics type and test for marshall/unmarshall of the topics. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * Adding topics type and test for marshall/unmarshall of the topics. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * removing reader, separating marshall from hash. * checking in current draft. * complete the test * Adding topics type and test for marshall/unmarshall of the topics. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * removing reader, separating marshall from hash. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * checking in current draft. * complete the test * some cleanup * fixes, lint, format. * Addressing Tsachi's comments * Addressing Tsachi's comments. getNonce() fixed, and a new test added for it. * Addressing few more comments. * Unifying getResponseChannel and removeResponseChaneel * addressing Pavel's comment: correcting a comment. * Actively scan for ledgers, normalize names cross platform (#842) Make ledger wallet names more canonical, check that sending a command doesn't return an error, only run active ledger for new devices. * require Encode() and Decode() to take msgp'ed types (#798) This ensures that calls to protocol.Encode() and protocol.Decode() are going to hit fast msgp-generated encoders and decoders. There are some places where we can't use msgp-generated code yet, for whatever reason, and those still invoke the reflection-based go-codec, using protocol.EncodeRefect() and protocol.DecodeReflect(). The main intent of this commit is to clearly identify places where we still invoke go-codec, and fix some trivial cases (like passing a struct to protocol.Encode by value instead of by pointer). Later on, we can go through the calls to protocol.EncodeReflect() and protocol.DecodeReflect() to see if we can get rid of the harder cases, to reduce or eliminate the use of go-codec altogether. * Change EnsureDigest to be asynchronous. (#754) This allows nodes which have received a threshold of cert-votes but not the corresponding block to continue to relay messages as normal. This prevents nodes in this state from inadvertently partitioning the network, which can cause stalls in very rare cases. - certThresholds now stage values in the proposal hierarchy, and essentially act like softThresholds (for the event.period) - Note: we can receive certThresholds for the previous period (but not softs, which aren't the freshest bundle). So now we can stage a value for the previous period, which is a side effect. - certThresholds fast forward periods and prevents subsequent period changes in the current round. - Do not cancel cryptographic verification of cert-bundles from old periods and continue to relay them. - Adds stageDigestAction, distinct from ensureAction, to signal the ledger that it should attempt to fetch the block given a certificate. It is not a blocking operation. - certThreshold without payloads now trigger stageDigestAction - If we receive a payload, check if cert is freshest bundle; if so, finish round. Co-authored-by: ben <me@vervious.com> * Strip any defined remote repo from branch name when building (#850) When using a wildcard (*) character to watch multiple branches when polling in Jenkins, the GIT_BRANCH environment variable will be "origin/rel/beta" instead of just "rel/beta". This breaks our tooling, but a simple fix is this util which simply strips any matched remote repo from the env var string value. * Implement DNSSEC resolving library (#830) * Implement DNSSEC resolving library * A, AAAA, SRV, CNAME lookup with sig verification * Recursive ip address lookup from CNAME with sig verification * Cached trust chain that is updated on DNSKEY cached sig expiration or zone signing key (ZSK) miss needed for end-user request's sig verification or DS-record confirmation on the chain update * Test harness includes a mock NS implementation for DNS-aware NS server * Closes #251 RFCs used: 1. DNS https://tools.ietf.org/html/rfc1035 2. DNS clarifications https://tools.ietf.org/html/rfc2181 3. DNSSEC proto change https://tools.ietf.org/html/rfc4035 4. DNSSEC RR change https://tools.ietf.org/html/rfc4034 5. DNSSEC clarifications https://tools.ietf.org/html/rfc6840 6. DNSSEC keys management https://tools.ietf.org/html/rfc6781 7. DNS SRV https://tools.ietf.org/html/rfc2782 * Utility to check relays' DNSSEC support * Make DNSSEC resolver interface compatible with net.Resolver * Use context * Change LookupCNAME: fail only if no A/AAA record, do not fail if no CNAME * Change LookupSRV: sort records by priority and randomize by weight * Change LookupIPAddr: always make recursive lookup * Implement missed functions like LookupTXT * Use DNSSEC for SRV retrieval * Make DNSSEC thread safe * Add deadlock.Mutex to protect cached trust chain * Always use a new instance of dns.Client to work around a race in ExchangeContext * Address review comments * Get rid of pointers to arrays * Add time param to verify* and makeTrustedZone functions to make tests against real DNSKEY/RRSIG snapshot robust * Rewrite UDP/TCP retries * Renames * Disable failed attempts to retrieve SRV in agreement gossip tests * Implement DNSSecurityFlags config variable * New config version and migration * Implement DNSSEC-aware DialContext * Closes #253 * Implement LookupTLSA * Tests for LookupTXT, NS, MX, TLSA * Minor comments and code fixes * Code review fixes * disable the concurrent wallet generation. (#848) * Force docker to use `root` as the user when running the instance (#849) By default, docker will use the root user, but the jenkins pipeline docker plugin inexplicitly runs the instance under the permissions of the user that launched the script that contains the docker command. * Improve some error checking and logging for build process (#851) * Fix comment in agreement. (#856) * Add MoI to network (#853) * Implement message of interest * Add missing file. * Make the ping handler optional. * fix typo. * Improve unit testing. * update return variable name, * Add comment. * Better error case handling in database utils (#857) * Fix few error handling edge cases * Fix bug in setupAgreementWithValidator * Better fix. * Explicitly curl go.1.12.9 and archive `get_latest_go.py` (#855) The golang download page was changed and our pinned version of golang is no longer referenced on it. This was breaking our build. Instead, for now we'll explicitly download the tarball via `curl`. https://golang.org/dl/?mode=json * Trap errors and remove ec2 instance (#854) Add error handling for the release build pipeline. * Update the update script. (#670) * Faster external_build_printlog by using curl instead of aws cli (#847) * Fix concurrent SQLite initialization (#872) * SQLite init is not thread safe and mattn/go-sqlite3 does not care * When open any db first time do it synchronously in order to make a nested sqlite3_initialize() the first call non-concurrently * Re-enable mutli-threaded account generation * Closes #846 * change _tx_lock -> _txlock (#871) * Redirect stdout of build log file to build release upload directory (#873) * Install boto3 as a build dependency for docker (#875) * Enable some skipped test on MacOS (#876) * Asset tests * Rest client test * Send-Receive test (TestAccountsCanSendMoney) - takes 16 minutes * Set root as explicit docker user for test phase (#874) * Refactor are combine the phonebook implementations (#870) Merge the three phonebooks implementations into one. * Adding a verifying signatures step to the build release pipeline (#878) * Wrap entire arguments in quotes Co-authored-by: Tsachi Herman <tsachi.herman@algorand.com> Co-authored-by: pzbitskiy <pavel@algorand.com> Co-authored-by: Derek Leung <derek@algorand.com> Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> Co-authored-by: algobolson <45948765+algobolson@users.noreply.github.com> Co-authored-by: Rotem Hemo <rotem@algorand.com> Co-authored-by: Will Winder <wwinder.unh@gmail.com> Co-authored-by: Max Justicz <max@justi.cz> Co-authored-by: algoradam <37638838+algoradam@users.noreply.github.com> Co-authored-by: Evan Richard <EvanJRichard@users.noreply.github.com> Co-authored-by: Nickolai Zeldovich <nickolai@csail.mit.edu> Co-authored-by: bricerisingalgorand <60147418+bricerisingalgorand@users.noreply.github.com> Co-authored-by: Shumo Chu <stechu@users.noreply.github.com> Co-authored-by: ben <me@vervious.com>
Closed
PhearZero
pushed a commit
to PhearNet/crypto
that referenced
this pull request
Jan 17, 2025
* Bump mainnet pregen to 1.0. (#569) * add lease to asset cmds (#575) * fix Disassemble when multiple bnz have the same target label (#612) add test * Replacing apt by apt-get (#610) * Add PeerConnections to network telemetry (#607) * Add PeerConnections to network telemetry. * omit Endpoint for incoming connections. * Fix license errors, enable check_license in travis. * Remove trailing whitespace. * add ?raw=1 to local block api to return msgpack bytes with full data (#621) * Let dsign sign arbitrary bytes, not just txids (#577) * Add markdown docs for `limit-order-a`, Fix `hltc` -> `htlc` (#619) * Created `test_release.sh` to test centos|fedora|ubuntu images (#613) * Created `test_release.sh` to test centos|fedora|ubuntu images * Incorporate some review suggestions (more to come): - change `apt` to `apt-get` - remove command to start the node - add `ENTRYPOINT` command to build image and test in one command - streamline command that downloads release and cleanup - moved script to `./test/packages/' - make `apt-get update` with the env var a one-liner * Add ability to pass bucket, channel and aws creds * Ensure aws creds are in env before starting * Make colorized text more readable * Break script into `build` and `run` operations * Run `update.sh` at RUN time This is another intermediate step. The installer is now being run at runtime, but it's not allowing for testing any binaries, such as `algod`. At this point, there are a couple different options to proceed, and I think it's best if Will, Tsachi and I talk more about the options. * We're not writing the Dockerfile to disk before running it. See my explanatory comment in the script. * Added new `post_deploy` stage and our script * Adding new `scripts/travis/test_release.sh` script This simply calls `./test/packages/test_release.sh`. Also, added name to `allow_failures`. * Add filtering for new `post_deploy` stage * Simplified the release scripts that build images to push to docker hub (#623) * Simplified the release scripts that build images to push to docker hub In pushing the updated images to docker hub, I noticed that the Dockerfiles and the shell scripts were only differentiated by the network name (stable|testnet). The only file in the dir is now `build_stable.sh`. It accepts a sole argument, `-n` or `--name`. It will default to "stable", so the for that image it's only necessary to run `./build_stable.sh` with no args. For "testnet", simply call the script like this: `build_stable.sh -n testnet`. The Dockerfile will be automatically created and passed to the `docker build` command via `stdin`. * Removed the case block for cli arguments Now, testing for either "mainnet" or "testnet" and returning early if neither value is present (defaults to "mainnet"). Also, changed the name to `build_releases.sh` since "stable" is no longer applicable. * Add `export SHELLOPTS` to teal tests. (#627) * Add `goal ledger block` (#622) * add goal ledger rawblock cmd * Bring `shellcheck` into the build process (#626) * Bring `shellcheck` into the build process Let's use bitwise operations to determine package presence * Added `check_shell` target to Makefile * Move install of shellcheck into `scripts/configure_dev.sh` Also, add shellcheck dependency to other dockerfiles. * Use `find` command in make target instead of recursive globbing What's up with the `exec +` syntax? From the man page: ``` -exec command {} + This variant of the -exec action runs the specified command on the selected files, but the command line is built by appending each selected file name at the end; the total number of invocations of the command will be much less than the number of matched files. The command line is built in much the same way that xargs builds its command lines. Only one instance of `{}' is allowed within the command, and (when find is being invoked from a shell) it should be quoted (for example, '{}') to protect it from interpretation by shells. The command is executed in the starting directory. If any invocation returns a non-zero value as exit status, then find returns a non-zero exit status. If find encounters an error, this can sometimes cause an immediate exit, so some pending commands may not be run at all. This variant of -exec always returns true. ``` * Only check for missing dependencies List any that are missing and the echo the script to run to install. * Fix issue on macOS to make script portable (#632) * Remove "Created new rootkey/partkey" spam message. (#629) * fix asset unit name display in goal account list (#633) * Ensure that the proper channel is passed to `test_release.sh` (#634) * Minor improvements to `test_release.sh` script (#636) - Removed a redundant `exit` statement. - Added script name to error statement. * Cleanup evalAux (#628) * remove evalAux which hasn't been used since before 1.0 * comment removal of auxdata column * Add --no-sig flag to goal clerk multisig sign (#647) * add --no-sig flag to goal clerk multisig sign * update err message * change preimage -> template * change template -> information * Scan for ledger wallets more often (#638) * add more robust ledger scanning, fix infinite recursion bug * fix comment * undo scan change * still delete wallets we fail to close * Exit early if `test_release.sh` script fails (#643) * Improve missing msig preimage error message (#648) * improve missing msig preimage error message * improve err msg * Add support for https for telemetry servers (#649) * Add support for https for telemetry servers. * typo : udo -> udp * Fixed few typos. * goal listpartkeys display error (#641) * Fixing arm64 environment issues (#653) 1) python3-venv libffi-dev libssl-dev libffi-dev (and libssl-dev) are needed by the cryptography package builder for python in e2e_basic_start_stop. 2) exporting GOPATHBIN needed to run algotmpl in template e2e tests. * Test pre-packaged executable on variety of linux platforms (#651) * Add platform testing using docker for generated binaries. * Fix path. * Apply reviewer's requested changes. * Reduce e2e_go_tests execution time twice (#645) There are seven major contributors to integration tests running time TestOnlineOfflineRewards (1248.64s) TestAssetConfig (364.71s) TestRewardRateRecalculation (226.78s) TestStartAndEndAuctionTenUsersOneBidEach (196.34s) TestNoDepositAssociatedWithBid (189.74s) TestDeadbeatBid (188.70s) TestStartAndCancelAuctionNoBids (183.35s) This commit considers only first three. 1. Fixing rewards interval in config for TestRewardRateRecalculation from 25 to 10 reduces time twice: TestRewardRateRecalculation (119.34s) 2. Fixing initialRound in TestOnlineOfflineRewards test from 301 to 11 reduces time 15 times: TestOnlineOfflineRewards (73.80s) 3. TestAssetConfig looks long by design - commits and waits max allowed assets 4. Address TODO in run_integration_tests.sh. Now e2e_client_runner calls 'goal network delete' to reflect this removal Refers #508 * Promote test_release.sh so that it won't conflict with release testing. (#655) * Fix concurrent access to wallet handles cache in goal (#654) * Fix concurrent access to wallet handles cache in goal * In rare cases (i.e. e2e tests run in parallel on the same network) a race cond happens when accessing goal.cache/walletHandles.json file * Introduce advisory locking on the mentioned file * Implementation is extendable by implementing *locker* interface for specific platform and providing a new *newLockedFile* constructor. * Address PR review notes * Do no truncate before obtaining the lock * Increase waiting interval to 10 ms * Simplify newLockedFile constructor * Allow upgrades to specify the delay before their execution. (#650) This replaces UpgradeWaitRounds with MinUpgradeWaitRounds and MaxUpgradeWaitRounds. Proposers specify an upgrade's delay given their own ApprovedUpgrades, encoding the proposed delay in the UpgradeVote. Verifiers check that the delay sits between MinUpgradeWaitRounds and MaxUpgradeWaitRounds (inclusive). This commit adds this functionality but does not change current behavior. * Set explicit 30 sec timeout for AlgorandGoal::RawSend in expect test (#658) * Should help with sporadic failures when we send and TEAL in groups * Support variable-delay protocol upgrades in ConsensusFuture. (#659) Also add some unit tests for variable-delay protocol upgrades. * Shant/catchup stop on unapproved (#660) * A fix for arm64 failures One observation from the failures is that the test timeouts could be the cause of the failure. Expect scripts when called from go test using CombinedOutput is behaving strange (slow). Replacing CombinedOutput with Run. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. Fixing errors and adding comments. * Fixing merge and comment. * added comment * Stop catchup on unapproved protocol round Catchup to stop before fetching the next round if the round protocol is not approved by the node * Some fixex. Review comments from Tsachi. * File accidentally added here. removing. * Reverting changes mistakenly added to this branch. * Adding comment changes. * Partially working test * Adding test to catchup stop on unsupported block Using s.cancel we are droppng the last block. * More tests and development to the catchup service * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. Addressing Tsachi's review comments. * Combine condition blocks * Fixing an error in the log info statement. * Compile linux/amd64 binaries with static linking (#625) * Test static compilation. * remove -fPIC * Try with ubuntu 18.04, since it has newer GCC. * exclude buildmode from test builds. * Fixed missed buildmode. * Refactor. * Add logging for the telemetry server connections (#661) * Add logging for the telemetry server connections. * Revert unintended change. * Improve error message. * add bool support to algocfg (#667) e.g. `algocfg set -p EnableProcessBlockStats -v true` * Reduce execution time of expect tests (#665) * CombinedOutput blocks on copying empty stderr stream from expect that causes at least 60 sec timeout for most of the tests * This implementation uses a temp time for stderr accumulation. In this case exec.Cmd does not run goroutines for reading child's actual stderr. * 655 sec (before) vs 205 sec (after) * Avoid upgrading boost on travis Mac builds (#669) * specify a boost version for the mac build. * try to prevent boost update on travis mac builds. * Abort algod startup if logging.config file has bad permissions (#662) * This should prevent telemetry event loses on systems with invalid permissions on ~/.algorand/logging.config file * Another possible workaround is to relax default config path mask in **cmd/goal/commands.go:ensureCacheDir** from 700 to 744. This is not implemented because of possible security risk. * Add error logging for getting a cached wallet handle (#663) Needed to debug 'Couldn't read password: inappropriate ioctl for device' error message in tests * Update license date 2019 -> 2020 (#674) * Change 2019 -> 2020 * Update readme. * Update copyright to use date range. (#676) * Tee existing tests so we can review output before piping it forward. (#677) * Make gracefull exit of a node that is waiting for WaitForBlock call (#679) * Make gracefull exit of a node that is waiting for WaitForBlock call. * Add comment. * Remove tput where not supported by terminal (#682) * Remove tput where not supported by terminal. * send tput errors to dev/null * Fix bad constants. * Avoid waiting for block that won't be reached due to unsupported protocol upgrade. (#681) * Fix - Indexer now shows received transactions (#684) -- Adding receiver function to transaction that returns the receiver of a transaction -- Fix indexer to show received transactions * Undo teeing to dev/tty as it doesn't work well in terminal free environments. (#689) * Improve lockFile error handling (#687) * Better lockFile error handling. * Make blocking locker. * Fix F_OFD_GETLK constant. * bugfix. * Try platform specific code. * use unix package to include F_OFD_SETLKW * remove unused imports. * Rename files. * Catchup service stop on unsupported and e2e test (#685) * A fix for arm64 failures One observation from the failures is that the test timeouts could be the cause of the failure. Expect scripts when called from go test using CombinedOutput is behaving strange (slow). Replacing CombinedOutput with Run. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. * DRAFT: this PR is a draft to experiment with test failures on ARM system. Disabling tests, that failes sporadically on mac, on ARM as well. Adding a utility to controll test skips. adding missing file and change. Fixing errors and adding comments. * Fixing merge and comment. * added comment * Stop catchup on unapproved protocol round Catchup to stop before fetching the next round if the round protocol is not approved by the node * Some fixex. Review comments from Tsachi. * File accidentally added here. removing. * Reverting changes mistakenly added to this branch. * Adding comment changes. * Partially working test * Adding test to catchup stop on unsupported block Using s.cancel we are droppng the last block. * More tests and development to the catchup service * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. * Stop the catchup before fetching the round with un-approved protocol. The catchup service will save the round when an an-approved protocol update will take place. Then, before creating a task to fetch a round, will check if the next round is when an an-approved protocol round begins, and stops the catchup service. The ledger should have the round with NextProtocolSwitchOn to stop the un-approved round from getting fetched. The added test covers the edge cases which may or may not happen when the service runs. Addressing Tsachi's review comments. * Combine condition blocks * Fixing an error in the log info statement. * Draft: Test for upgrading a node while keeping another node not upgradable goal node status field for informing if the node is upgradable * Catchup service stop on unsupported, ode status message, and e2e test In this change: Updated catchup service to stop on unsupported and not unupgradable. Updated goal node status to inform when the catchup service is stopped. Updated goal node status by removing last synced information. Added e2e test for stopped catchup service on unsupported protocol. * Separating goal changes from this PR. Separating goal changes from this PR. goal changes are in PR: https://github.com/algorand/go-algorand/pull/686 * review comment: use NotEqual instead of True * Make ARM64 build mandatory. (#694) * Updates to the goal node status (#686) * Updates to the goal node status This change is splitting the goal section from PR: https://github.com/algorand/go-algorand/pull/685 Updated goal node status to inform when the catchup service is stopped. Updated goal node status by removing "Synced Since Startup" field. * Adding parameter StoppedAtUnsupportedRound to v1.NodeStatus and node.StatusReport * Adding check to libgoal Client StoppedAtUnsupportedRound in v1.NodeStatus true and false values. * Review comments from Tsachi: using the timeout in select * Updating the test to reflect the removal of: has synced since startup. * telemetry recorded locally as info log (#666) config.json: {"TelemetryToLog":true} logging.config: {"Enable":false,"SendToLog":true} * Relax StartNetwork regex (#696) * relax StartNetwork regex. * Another attempt. * Two fixes to basicCatchup_test: cloned node not stopped and env var conflict (#697) * Updates to the goal node status This change is splitting the goal section from PR: https://github.com/algorand/go-algorand/pull/685 Updated goal node status to inform when the catchup service is stopped. Updated goal node status by removing "Synced Since Startup" field. * Adding parameter StoppedAtUnsupportedRound to v1.NodeStatus and node.StatusReport * Adding check to libgoal Client StoppedAtUnsupportedRound in v1.NodeStatus true and false values. * Review comments from Tsachi: using the timeout in select * Two fixes to basicCatchup_test: cloned node not terminated and env var collision 1) TestBasicCatchup and newly added TestStoppedCatchupOnUnsupported create a new node by cloning one of the network nodes. When fixture.Shutdown() stops the original network nodes, leaves the cloned node running. This change adds function shutDownClonedNode to stop the cloned nodes. 2) In TestStoppedCatchupOnUnsupported, an env variable is used to delete ConsensusCurrentVersion, so that the cloned node behaves as if its binary does not support the consensus version. However, when the TestBasicCatchup runs in parallel, it also picks up the env variable, and consequently deletes ConsensusCurrentVersion from the Consensus map. When this happens, TestBasicCatchup sporadically fails. In this change, instead of having ConsensusTestUnupgradedProtocol upgrade to ConsensusCurrentVersion, or deleting ConsensusCurrentVersion so it cannot be upgraded, it sets up ConsensusTestUnupgradedProtocol to upgrade to ConsensusTestUnupgradedToProtocol. Hence, the env variable is used to delete ConsensusTestUnupgradedToProtocol. This way the conflict with other tests is eliminated. * Fixing golint by addint comment. * Tsachi's review comment: unsetting the env var. * Make scripts executable. (#702) * More reliable fetcher unit tests. (#708) * Avoid starting the Telemetry service when logging is disabled (#703) if remote telemetry is not enabled, do not start uri update service add a nil check * Shutdown kmd when test fixture is going down. (#709) * Fix unit test. (#711) * Execute e2e tests one at a time on arm64 (#701) * Test changes. * Better error reporting on goalFixture * Add version query for kmd startup. * Few more test cases to cover. * try to wait. * changes * Update. * Move KMD shutdown to network. * Add some debug messages to figure out what's going on. * Fix script bug. * Fix proper KMD shutdown via the KMDFixture * Run the tests one at a time only on arm64 * Updating according to review. * Disable pprof endpoints by default (#693) * enable go profiler for netdeploy * add EnableProfiler to ConfigJSONOverride * Update the makefile to skip the static linking when compiling on centos. (#713) * Fail e2e-go tests when node panics (#699) * Fail test on panic * few more touchups. * sync * bugfix. * Update few more usecases. * Refactoring * Simplify. * undo network referencing. * undo few func-ptr. * undo some more stuff. * Update method names * Few more touchups. * Build release job (#698) * Initial commit * Added Jenkinsfile * Updated Jenkinsfile * Works until GPG IPC * Move build files into new release/ dir Also, renamed files {build_,}release.sh and {build_,}setup.sh * Path issues * Use t2.xlarge instance type (4 vCPUs, 16GB ram) * Restructuring * shellchecked * fix bug * Added new `socket.sh` file * Trying to build rpm * Bump up disk size of ec2 instance * more attempts to make rpm * more fixes * move /stuff -> /root/stuff * wip * moved to correct paths * Have `release` have its own start and kill ec2 instance scripts * use buildhost scripts after all * Make sure the gpg key name matches!!!!! -%_gpg_name Algorand RPM <rpm@algorand.com> +%_gpg_name rpm algorand <rpm@algorand.com> * fixes * Add upload stage to pipeline * Add tag stage to pipeline * more fixes * Move start/stop ec2 instance scripts back into release/ * Add ability to dynamically set branch * Added controller/ subdir * Some cleanup * Adding tag support Moved `Jenkinsfile` into controller/ subdir. * Move build_env build.sh -> setup.sh Moved socket.sh -> controller/socket.sh * Revert buildhost changes * some cleanup * fix build * test packages locally * upload packages to s3 test bucket * restructure * misc * fix build * Add Jenkins parameters * fix build * Move commands into Jenkinsfile into stages/ * fix build * Make test stage more explicit * fix build * Implementing reviewer suggestions * Added debug info * fix build * Merge into master * implement reviewer suggestions * turn off test stage * fix build * fix build * fix build * Update readme * removed unneeded archive/ dir * Use service-wide logger instead of logging.Base() in agreement (#714) * Switch from default logger to pre-configured logger in some components of agreement service * Mark some of the slow e2e tests as such (#719) * Mark some of the slow e2e tests as such. * Move shorttest flag to be set at top level. * Wait test less restrictive. (#718) * Move slow test to get executed on nightly builds (#721) * Move some more test to be "slow tests", and modify short test condition so that we will run the long tests on nightly builds only. * Fix elif -> else * Faster upgrade tests. (#722) * Disable failing test. (#724) * Generate docs for algokey. * s/goal/algokey * Improve algons error logging (#733) * Write body when erroring on SRV/DNS records update. * Few more error messages. * ledger/eval refactor (#700) refactor ledger/eval block validation don't do crypto+lsig validation in eval fix sync in backlog executer queue clean up lots of logging to make tests quieter * Fix a bug in Credential.lowestOutput caused by improper domain separation (#716) * Fix a bug in Credential.lowestOutput caused by improper domain separation The bug causes larger accounts to be block proposers more often than should happen based on their fraction of online stake. This patch will cause nodes to vote for a protocol upgrade that fixes the buggy behavior. After the protocol upgrade goes through, all the upgrade-related code in this commit should be removed, as it's not necessary to retain the old buggy behavior for catchup. (For convenience code to be removed is marked with a "TODO(upgrade)" comment.) * Typofix; fix merge issue * Fix test * Add a comment to make the linter happy * Typo fixes * Goal docs tweaks (#731) * test all `goal ... -h` (#730) * test all `goal ... -h` ensures no conflicting subcommand options adds less than 2 seconds to test time * review tweak, rearrange to sub test script * actually pass args * grr, arg * Move EnsureDigest logic into the catchup service (#726) * Move EnsureDigest logic into the catchup service. * update unit tests. * Add unit testing for new catchup feature. * updating per review. * Add handing for concurrently updated round. * Add comment. * typo * Correct the quit semantics. * Faster stringer implementation for Address (#736) * Faster stringer implementation. * Optimize UnmarshalChecksumAddress as well. * Add comment. * Interconnect relays on a locally deployed network (#742) * static codegen for msgpack encode/decode (#578) Implement static code generation for msgpack encoding and decoding of blocks and transactions. The existing functions `protocol.Encode` and `protocol.Decode` invoke the generated encoders and decoders if present. Benchmarking block encode/decode suggests this is about 4x faster than go-codec (which we were using previously). When changing existing data structures to be encoded, or adding new ones, run `make msgp`. Some code is still using go-codec (notably agreement). If we convert all code to use this static code generation plan, we could get rid of the dynamic check and dispatch in `protocol.Encode` and `protocol.Decode`. Having fast encoding/decoding is not only good for performance, but allows us to remove complex optimizations (like caching txid values or encoding lengths, removed in this commit), and might allow us to perform checks that we previously thought would be too expensive (like making sure that an encoding is canonical, by re-encoding). Having explicitly generated code also makes it easier to understand performance and tweak it further. Results from pprof should be much less opaque (no reflection) and more actionable. Explicit codegen also makes it clear when we make a change that affects encoding/decoding of network messages. The code generation is done using a modified version of github.com/tinylib/msgp, forked as github.com/algorand/msgp. * Use cobra for the kmd command to allow for documentation automation. * Limit client side connection rate, part 1 * Draft of the solution * saving current changes * saving current changes * saving current changes * saving current changes * saving current changes * saving current changes * saving current changes * Addressing review comments. * fixing test failure * fixing test failure2 * Adding a unit test * txsync now will go through http request connection limit. * Addressing review comments. Changing phonebookEntries duration type from uint to time.Duration * fixint test failure. * splitting wait for connection time and add connection time. Addressing some review comments. * recording provisional time before connect, updating after. * minor fixes * Embedding MockNetwork in mock structs which implment GossipNode to avoid the implmentation of dummy functions to satisfy the interface. * not embedding by reference. * A few more review comment fixes. * Fix checkdep message. (#745) * Fix equal stake distribution in generated networks (#749) * Use math.big.Rat rational numbers to get rid of summation error * Root cause although in JSON serialization of float64 data type so that some values are rounded and others are not. Correct fix seems to be in using the same accuracy in distribution code and float64 marshaling. * Update with PR feedback. * Change a player test to use either old buggy behavior or new correct behavior depending on ConsensusCurrentVersion. (#748) This allows agreement tests to pass whether ConsensusCurrentVersion is the old V20 or the new V21 * Bugfix: Fix last relevant proposal period in agreement protocol. (#746) When retrieving the last relevant period corresponding to a proposal-value, the proposal store inside the agreement protocol does not properly check that the particular period returned actually matches the passed-in proposal-value. Instead, the proposal store returns the last period seen for *any* proposal-value. When the agreement state machine receives a proposal payload, the proposal store checks whether this payload matches any proposal-value known to be relevant in the current round. If it does, the state machine tells the crypto verifier to verify the new payload. As an optimization, the proposal store in the state machine also tags the payload with the last period in which it is relevant (and whether the matching proposal-value is pinned). The crypto verifier halts concurrent verification of any payload from that period. Separately, the proposal store does not attempt to verify payloads more than once, caching past payloads it has pipelined. For this optimization to be correct, the last relevant period must be correct; otherwise, the network will permanently stall if the following occurs: - In period p, the network observes a best proposal value of v, but it sees neither the payload B corresponding to v nor a threshold of soft-votes for B (seeing such a threshold pins B, preventing the crypto verifier from cancelling). - An attacker is able to see B. - In period p+1, the network attempts to agree on a new proposal value v' corresponding to the payload B'. - After half of the network has received B' but has _not_ finished verifying it, the attacker sends this half the payload B. This half will cancel verification of B' (since it erroneously associates B with period p+1) and will permanently ignore any future broadcasts of B' (which was cached in the proposal store). - If the other half has already staged B', the network will stall permanently, since it will be unable to commit B'. Fixes #710. Thanks to @xixisese for reporting this bug. * Format numbers using number specifier (#735) * Use %d to print numbers, which is abit safer as it prevent potential recursion. * Few more changes to the fuzzer. * Two more updates. * Implement local net template generation with netgoal (#762) * Usage: netgoal generate -n 1 -R 1 -w 100 -o mynettemplate.json -r . -t goalnet goal network create -t mynettemplate.json -r mynet -n mynet * Remove duplicate definitions from netdeploy/networkTemplate * Improve net templates support (#766) * Fix file descriptors leak in 'goal account'. Now goal can import more than maxfiles keys * Fix uint overflow in stake distribution validation. Details: values 10 and -110 were casted to uint and sum up to 100 pct with 32 bits overflow * Allow pct fraction of stake in goal net templates * Fix stake distribution in netgoal.generate: it always produces pcts and not values in algos as was incorrectly thought before * Add tests for netdeploy.Validate() * Release build pipeline step 1: Build, package, sign, deploy to staging (#763) * Reorganize * more restructuring * cleanup * removing test bits * changing upload destination * remove test dir * remove cruft * Moved Jenkinsfile -> jenkinsfile/Build * replace {RSTAMP,FULLVERSION} * fix bugs * remove temp dir location * remove buildnumber.dat * Implement automation for release notes generator (#761) The cicd.yaml config file in this branch can be consumed by our cicd cli to create a draft for release notes for a given version. * back out locking added in c78ada09f230a3c66cd934860700f93ff31a93eb (#764) * back out locking added in c78ada09f230a3c66cd934860700f93ff31a93eb * remove IsFull * bring back txn liveness check. buffer up to all payset groups in chan * no chan close * Implement dummy telemetry hook to safely perform operations on it when telemetry is disabled (#768) * The idea is have telemetry.hook always set. For telemetry disabled case this is a simple noop stub. * Prevents crashes when calling hook.Close/Flush on private networks in case of errors * Remove instances of tagging in our build process (#770) We don't want to be making tags anywhere in our automation. Our release process will take care of that. * Configurable consensus protocol (#750) * Create consensus.json * some changes.. * remove deadcode. * update constant. * Update fixture. * migrate fast upgrade protocols. * move catchup test protocol. * push staged changes. * bugfix. * Remove last test consensus param. * rollback block.go * cleanup : map[protocol.ConsensusVersion]ConsensusParams -> ConsensusProtocols * udpate. * Fix unit test. * Release build pipeline step 2: Test (#773) * Reorganize * more restructuring * begin test stuff * restructure * fix deb test * fix rpm test * fix build * restructure * fix bug * remove temporary feature branch * added new gpg.sh * removed buildnumber.dat * When locally installing, take the binaries from the first-gopath-bin directory. (#776) * Remove temporary build test location (#777) * Make sure to default to Consensus if consensus.json is missing. (#779) * Make util.ExecAndCaptureOutput able to process large output (#771) * In case of large amount of data written to stdout/stderr from the wrapped command the process is blocked until stdout/stderr buffers cleared. * Old implementation waited until cmd return and then read stdout/stderr. * New implementation reads stdout/stderr pipes in goroutines. * Make goal node state change commands systemd aware (#769) * Make goal node state change commands systemd aware I added a property to libgoal/system.go where we can set whether or not our algod process is managed by systemd. * Write expect test for goal node with systemd scenarios This tests that the message from our cli on goal node start, stop and restarts is correct for systemd_managed data_dirs. * Write expect test for goal node start, stop and restart This tests that the message from our cli on goal node start, stop and restarts is correct for data_dirs that are not managed by systemd. * Add systemd_managed: true as a default in system.json Since all linux installs currently use systemd, I added this to the base system.json file. * Restructure release/ dir (#782) * Restructure release/ dir for each build release pipeline stage First step is the `build` pipeline. * More restructuring Removed `release/ci/`. Every dir under `release/` will now be a pipeline. * Added "test" pipeline * update readme * Remove temp location and remove code cruft * removed outdated readme * more cleanup * implement reviewer changes * Allow asset creation transactions to be created while catching up. (#790) * Tunnel outgoing connection via a rate limiting dialer (#780) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Addressing Pavel's comments. * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Addressing Pavel's comments. * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Allow asset creation transactions to be created while catching up. (#790) * Addressing Pavel's comments. * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Integrating changes from Tsachi + cleanups. * fixing build failure. * fixing build failure. * Addressing Pavel's comments. * rebasing master Co-authored-by: Tsachi Herman <tsachi.herman@algorand.com> Co-authored-by: Will Winder <wwinder.unh@gmail.com> * Release build pipeline step 3: Added "prod" pipeline to `release/` (#788) * Release build pipeline step 3: Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Implement reviewer suggestion * better algons error messages. (#794) * Create a rate limiting transport (#795) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Rate limiting transport. * remove comment. * Unify dialing path. * Removing ForceAttemptHTTP2 which isn't available on go 1.12 Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Some release abstraction (#796) * Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Restructuring * test stage * add common build params * Remove temp github location * Change agreement message encoder to msgp. (#786) * Upgrade to new version of msgp. - omitemptyarray and omitempty are correctly distinguished between in equivocationVoteAuthenticator. - The embedded Block is correctly handled in proposal, unauthenticatedProposal, and transmittedPayload. * Randomize anonymous (embedded) fields when testing codec. Co-authored-by: Nickolai Zeldovich <nickolai@csail.mit.edu> * Move fetcher client into catchup (#774) * changes. * adding dialer. * Move fetcher client into catchup, step 1. ( most unit tests are still broken ) * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * update. * fix few more unit tests. * fix syncer tests. * undo change. * Add a comment. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Fix gpg keygrip code and remove old code (#797) * bugfix : compile correctly teal program that includes a base64 string which starts with double slash (#787) * update. * Improve test. * Add support for multiple network protocol versions (#799) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Add a version-accept header to support multiple network protocol versions. * update. * Remove comments. * Addresing reviewer concerns. * Add a unit test for checkProtocolVersionMatch logic. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Include comment about something that looks like a vulnerability, but isn't. (#820) * Skip logging and telemetry when not needed. (#737) * Added utils for testing release packages (#819) * Added utils for testing release packages check_sig: Verify gpg signatures of build artifacts. test_package: Verifies the packages were built from the correct branch with the correct hash and verifies the test version release number. * Implement reviewer feedback * Update docker build script to be more flexible with its naming (#822) * Deleting out-of-date wallet folder in go-algorand. (#821) * Some build fixes (#818) * Some build fixes Most importantly, move the `fullversion.dat` file to the $HOME directory and use it for the name of the upload directory on s3. It should have been doing this before, but it was copying it to the wrong location on the ec2 instance. * Implement reviewer suggestions * Completely remove temp dir before re-creating it * Move `dsign` functionality to goal (#800) * Deferred persistent crash data validation (#823) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Perform the crash-decoding after responding to the event, so that the new vote won't be blocked. * undo unintended changes. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Update Dockerfile for our official docker image (#826) * fix incorrect comments (#825) * Reduce the log verbosity on scenario 3 deployed network (#828) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Reduce the amount of logs on s3 network. When running s3, our performnace is negatively impacted by high amount of logging. This change reduces the logging to warning and above. * undo Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Trigger test build (#831) * Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Restructuring * test stage * add common build params * fix build * test * Remove build parameters * wip * remove test dir * still trying to fix random build errors * updating test phase * extract build_env values * add trigger for test phase * test * removed test location * More release build fixes (#836) * Added "prod" pipeline to `release/` * Added snapshot.sh to be invoked manually * Restructuring * test stage * add common build params * fix build * test * Remove build parameters * wip * remove test dir * still trying to fix random build errors * updating test phase * extract build_env values * add trigger for test phase * derp * remove test location * Split consensus from config (#832) * Split consensus from config. * few more changes. * netgoal: create accounts in parallel (#827) * changes. * adding dialer. * DRAFT: using channel to offload the mutex. * Taking care of the lock triggering deadlock detection. * cleaning unnecessary changes. * cleaning unnecessary changes. * minor fixes * GetNetTransport modifying returning copy of the http.Transport * workaround to avoid the race detection trigger. * Testing a different approach to ovrride the Dial/DialContext by embedding the default transport into another object instead of changing the default transport. * Adding RateLimitedTransport to wrap around the http.Transport * fixing lint * Separating Dialer from Transport, initializing the Dialer and Transport params (timeout, etc) * changes. * Make parallel accounts. * undo change. * handle data race. * use atomics. Co-authored-by: algonautshant <55754073+algonautshant@users.noreply.github.com> * Updated job name to match on the Jenkins server (#837) * Brice/refactor make (#835) * Refactor makefile I refactored how we build libsodium to support multiple os and cpu architectures from the crypto dir. Also I added some make targets that work the way our ci pipeline needs them to. * Add flags for other linux architectures in crypto/vrf.go * Remove yum commands from configure_dev script I decided we don't need these here. I just left the which apt-get so that this script works the same but doesn't break on centos. * Add multi platform support to cicd yaml Now we have stages to do builds on different platforms utilizing docker and qemu cpu virtualization. * Refactor libsodium dep management Before the libsodium dep paths were hardcoded under cgo tags, now they're being passed in through env vars. Also throwing in a dockerfile for our cicd process. * Revert change to configure_dev.sh These changes actually aren't necessary since our build process doesn't use this script. * Switch back to using cgo tags for CFLAGS and LDFLAGS This way LDFLAGS aren't used all over the place unecessarily which could cause problems in the future. * Fix names of things in Makefile Fixed the name of crypto/lib/libsodium.a to crypto/libs/$(OS_TYPE)/$(ARCH)/lib/libsodium.a so that it reflects the updated project structure. Also changed VARIATIONS=literally_anything in ci-build to VARIATIONS=$(OS_TYPE)/$(ARCH) so that it looks like it's useful. * Update cicd.yaml to use the new shell.docker.Ensure task This task makes sure that the docker image(s) our tasks depend on are avaiable during stage executions. It either pulls the docker image or builds it from scratch when it's not available. * Fix references to crypto/lib/libsodium.a make target A travis script was referencing this directly so I fixed the target. Also, I removed an unnecessary reference in our rpm build script. * Remove ci-deps from docker build make targets Those were there by mistake, and having them kind of defeated the purpose packing those deps with the images. Also I moved ci-deps to the shell.Make target in build-local since those are necessary there. * Run build and test jobs in a docker container (#840) * Brice/fix deploy linux (#767) * Make dockerignore file This file will prevent docker build contexts from loading certain files when creating docker build contexts. I just made it a copy of .gitignore since those files don't seem to be necessary for any current Dockerfile for go-algorand. * Fix unnecessary cd into parent directory of project root This was causing huge docker contexts for no apparent reason. * Change dockerignore to include some necessary files I switched tmp to tmp/dev_pkg and tmp/out to ignore large folders that seem unnecessary for any docker build today and removed ignores for the network gen files * Limit msgp tool warning message scope (#834) * Try to reduce msgp verbosity. * update * update msgp version in go.mod * update go.sum * Remove old entries from go.sum * Refactoring peer unicast implementation (#841) * Adding topics type and test for marshall/unmarshall of the topics. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * Adding topics type and test for marshall/unmarshall of the topics. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * removing reader, separating marshall from hash. * checking in current draft. * complete the test * Adding topics type and test for marshall/unmarshall of the topics. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * removing reader, separating marshall from hash. * Adding comments and error handling. * Fixing lint and fmt. * Adding hash function to topics * checking in current draft. * complete the test * some cleanup * fixes, lint, format. * Addressing Tsachi's comments * Addressing Tsachi's comments. getNonce() fixed, and a new test added for it. * Addressing few more comments. * Unifying getResponseChannel and removeResponseChaneel * addressing Pavel's comment: correcting a comment. * Actively scan for ledgers, normalize names cross platform (#842) Make ledger wallet names more canonical, check that sending a command doesn't return an error, only run active ledger for new devices. * require Encode() and Decode() to take msgp'ed types (#798) This ensures that calls to protocol.Encode() and protocol.Decode() are going to hit fast msgp-generated encoders and decoders. There are some places where we can't use msgp-generated code yet, for whatever reason, and those still invoke the reflection-based go-codec, using protocol.EncodeRefect() and protocol.DecodeReflect(). The main intent of this commit is to clearly identify places where we still invoke go-codec, and fix some trivial cases (like passing a struct to protocol.Encode by value instead of by pointer). Later on, we can go through the calls to protocol.EncodeReflect() and protocol.DecodeReflect() to see if we can get rid of the harder cases, to reduce or eliminate the use of go-codec altogether. * Change EnsureDigest to be asynchronous. (#754) This allows nodes which have received a threshold of cert-votes but not the corresponding block to continue to relay messages as normal. This prevents nodes in this state from inadvertently partitioning the network, which can cause stalls in very rare cases. - certThresholds now stage values in the proposal hierarchy, and essentially act like softThresholds (for the event.period) - Note: we can receive certThresholds for the previous period (but not softs, which aren't the freshest bundle). So now we can stage a value for the previous period, which is a side effect. - certThresholds fast forward periods and prevents subsequent period changes in the current round. - Do not cancel cryptographic verification of cert-bundles from old periods and continue to relay them. - Adds stageDigestAction, distinct from ensureAction, to signal the ledger that it should attempt to fetch the block given a certificate. It is not a blocking operation. - certThreshold without payloads now trigger stageDigestAction - If we receive a payload, check if cert is freshest bundle; if so, finish round. Co-authored-by: ben <me@vervious.com> * Strip any defined remote repo from branch name when building (#850) When using a wildcard (*) character to watch multiple branches when polling in Jenkins, the GIT_BRANCH environment variable will be "origin/rel/beta" instead of just "rel/beta". This breaks our tooling, but a simple fix is this util which simply strips any matched remote repo from the env var string value. * Implement DNSSEC resolving library (#830) * Implement DNSSEC resolving library * A, AAAA, SRV, CNAME lookup with sig verification * Recursive ip address lookup from CNAME with sig verification * Cached trust chain that is updated on DNSKEY cached sig expiration or zone signing key (ZSK) miss needed for end-user request's sig verification or DS-record confirmation on the chain update * Test harness includes a mock NS implementation for DNS-aware NS server * Closes #251 RFCs used: 1. DNS https://tools.ietf.org/html/rfc1035 2. DNS clarifications https://tools.ietf.org/html/rfc2181 3. DNSSEC proto change https://tools.ietf.org/html/rfc4035 4. DNSSEC RR change https://tools.ietf.org/html/rfc4034 5. DNSSEC clarifications https://tools.ietf.org/html/rfc6840 6. DNSSEC keys management https://tools.ietf.org/html/rfc6781 7. DNS SRV https://tools.ietf.org/html/rfc2782 * Utility to check relays' DNSSEC support * Make DNSSEC resolver interface compatible with net.Resolver * Use context * Change LookupCNAME: fail only if no A/AAA record, do not fail if no CNAME * Change LookupSRV: sort records by priority and randomize by weight * Change LookupIPAddr: always make recursive lookup * Implement missed functions like LookupTXT * Use DNSSEC for SRV retrieval * Make DNSSEC thread safe * Add deadlock.Mutex to protect cached trust chain * Always use a new instance of dns.Client to work around a race in ExchangeContext * Address review comments * Get rid of pointers to arrays * Add time param to verify* and makeTrustedZone functions to make tests against real DNSKEY/RRSIG snapshot robust * Rewrite UDP/TCP retries * Renames * Disable failed attempts to retrieve SRV in agreement gossip tests * Implement DNSSecurityFlags config variable * New config version and migration * Implement DNSSEC-aware DialContext * Closes #253 * Implement LookupTLSA * Tests for LookupTXT, NS, MX, TLSA * Minor comments and code fixes * Code review fixes * disable the concurrent wallet generation. (#848) * Force docker to use `root` as the user when running the instance (#849) By default, docker will use the root user, but the jenkins pipeline docker plugin inexplicitly runs the instance under the permissions of the user that launched the script that contains the docker command. * Improve some error checking and logging for build process (#851) * Fix comment in agreement. (#856) * Add MoI to network (#853) * Implement message of interest * Add missing file. * Make the ping handler optional. * fix typo. * Improve unit testing. * update return variable name, * Add comment. * Better error case handling in database utils (#857) * Fix few error handling edge cases * Fix bug in setupAgreementWithValidator * Better fix. * Explicitly curl go.1.12.9 and archive `get_latest_go.py` (#855) The golang download page was changed and our pinned version of golang is no longer referenced on it. This was breaking our build. Instead, for now we'll explicitly download the tarball via `curl`. https://golang.org/dl/?mode=json * Trap errors and remove ec2 instance (#854) Add error handling for the release build pipeline. * Update the update script. (#670) * Faster external_build_printlog by using curl instead of aws cli (#847) * Fix concurrent SQLite initialization (#872) * SQLite init is not thread safe and mattn/go-sqlite3 does not care * When open any db first time do it synchronously in order to make a nested sqlite3_initialize() the first call non-concurrently * Re-enable mutli-threaded account generation * Closes #846 * change _tx_lock -> _txlock (#871) * Redirect stdout of build log file to build release upload directory (#873) * Install boto3 as a build dependency for docker (#875) * Enable some skipped test on MacOS (#876) * Asset tests * Rest client test * Send-Receive test (TestAccountsCanSendMoney) - takes 16 minutes * Set root as explicit docker user for test phase (#874) * Refactor are combine the phonebook implementations (#870) Merge the three phonebooks implementations into one. * Adding a verifying signatures step to the build release pipeline (#878) * fix typo in check_deps.sh message (#884) * Update list of DNSSEC-aware resolvers (#883) * Fixing error reporting to read from the stream. (#887) * Shoehorn `test_package.sh` into the test phase (#877) * Brice/refactor cicd stages to use persistent fields (#879) * Refactor cicd.yaml to use persistent fields Now we have on task generating the docker image version used in subsequent stage tasks * Install libc-compat through musl-dev instead of installing it directly This package comes with more packages which may or may not help. * Move build actions to one make task This will speed up the build by reducing the amount of redundent make target executions. * Refactor Makefile to build using -static on alpine Also, removed the if around amd64 vs arm64 so builds are more consistent. * Remove tests from armv6 build Tests don't work on that cpu arch because --race isn't supported. * Add conditional to build arm packages with static linking * Up memory map space in centos container The default is too low for builds on amd64 * Set -static flag to ld only for arm builds on alpine This way we are limiting the static option to arm builds on our docker container. * Rename arm references for arm32v6 builds After our talk yesterday, I changed references for arm builds to be consistent with other parts of our automation. * Add some more files to .dockerignore file These files are not necessary and they make the builds take much longer * Delete go-algorand repo in builder image This always gets overwritten when it's used and it takes up a lot of space * Have build-local run all make targets at once * Remove .git folder from .dockerignore This is used by some of our automation * Strip remote repo name from branch variable name in build release pipeline (#897) * Support of older kernels for locking files (#895) * Use golang.org/x/sys/unix instead of syscall The latter package is deprecated See https://golang.org/pkg/syscall/ * Always use non-OFD locks on non-Linux OS Previously, availability of OFD locks was tested on non-Linux OS. To do that, the syscall cmd constant `syscall.F_OFD_GETLK` was hard-coded in `libgoal/lockedFileUnix.go`, because this syscall cmd constant was not available in the Go library for non-Linux OS. However, different architectures may have different syscall constants. Furthermore, it seems that currently, only Linux supports OFD locks. This commit removes hard-coded syscall constants and systematically uses non-OFD locks on non-Linus OS. * Default to non-OFD locks when OFD locks unavailable Older kernels (before 3.15, and in particular the kernel from WSL - Windows Subsystem for Linux) do not support OFD locks. This commits adds a test for the availability of OFD locks. The test is similar to what was done before in `lockedFileUnix.go`, (removed by commit 11bc50da77278021e60922f6a4d5aac2bf9e6d40) with two main differences: * no syscall constant is hardcoded * unavailability of OFD locks is more fine-grained: `errno` is checked to be `unix.EINVAL` rather than any error in case of a different `errno`, panic (this should never happen) * Re-ordering imports * Return error instead of panicking in `makeLocker` * Remove the phonebook from the node (#893) * Initial draft of: remove phonebook from node. * minor fixes * fixes from Tsachi's comments. * Rename cicd.yaml to mule.yaml (#894) We renamed our cli to mule, so our cicd.yaml file is now a mule.yaml file * Add sqlite3 as a dependency (#891) * add sqlite3 as a dependency When running `make`, `sqlite3` is used but was not included as a dependency in: * `scripts/check_deps.sh` * `scripts/configure_dev.sh` * Do not upgrade sqlite3 on macOS This is not useful and causes issues with Travis. * Catchupsrv tars (#881) * can serve from directory of M_N.tar.bz2 block tars * faster block tar access. round robin replacement. undo unused config change. * switch Mutex library * Extend timeouts for simulate_test and service_test to support (#905) ci_integration testing. * shellchecked `build_deb.sh` (#882) * shellchecked `build_deb.sh` * Test pre-packaged executable on variety of linux platforms (#651) * Add platform testing using docker for generated binaries. * Fix path. * Apply reviewer's requested changes. * Reduce e2e_go_tests execution time twice (#645) There are seven major contributors to integration tests running time TestOnlineOfflineRewards (1248.64s) TestAssetConfig (364.71s) TestRewardRateRecalculation (226.78s) TestStartAndEndAuctionTenUsersOneBidEach (196.34s) TestNoDepositAssociatedWithBid (189.74s) TestDeadbeatBid (188.70s) TestStartAndCancelAuctionNoBids (183.35s) This commit considers only first three. 1. Fixing rewards interval in config for TestRewardRateRecalculation from 25 to 10 reduces time twice: TestRewardRateRecalculation (119.34s) 2. Fixing initialRound in TestOnlineOfflineRewards test from 301 to 11 reduces time 15 times: TestOnlineOfflineRewards (73.80s) 3. TestAssetConfig looks long by design - commits and waits max allowed …
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Add a comment about default telemetry configuration.