Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Enable tsan on linux64 build #4563

Merged

Conversation

gabriel-bjg
Copy link

@gabriel-bjg gabriel-bjg commented Nov 2, 2021

This pull request enables thread sanitiser for linux64 build (including unit tests & functional tests) and fixes/suppresses the following:

  • Switch ubsan to run on linux32 as tsan can't run there (linker does not support thread sanitizer)
  • Fix data race in CCoinJoinClientManager (triggered by concurrent access on fMixing)
  • Suppress data races triggered in zmq namespace (same suppression exists in bitcoin)
  • Suppress deadlock false positive the sme way bitcoin does it (i.e. deadlock:CChainState::ConnectTip). False positive description:
One thread:
generateBlocks|mining.cpp:124
    CreateNewBlock
        LOCK2(cs_main, cs_mempool) | called from miner.cpp:128
            TestBlockValidity | called from miner.cpp:235
                    ConnectBlock | called from validation.cpp:4200
                        ProcessSpecialTxsInBlock | called from validation.cpp:2123
                            […]
                                NotifyMasternodeListChanged
                                    UpdateCachesAndClean
                                        LOCK(cs_main, cs_governance) | called from governance/governance.cpp:352
Lock order: cs_main, cs_mempool, cs_governance

The other thread:
AddGovernanceObject
    LOCK2(cs_main, cs_governance) | called from governance/governance.cpp:293
        IsValidLocally | called from governance/governance.cpp:298
            […]
                GetTransaction
                    mempool.get(hash) | called from validation.cpp:893
                        LOCK(cs_mempool) | called from txmempool.cpp:1219
Lock order: cs_main, cs_governance, cs_mempool

This kind of lock order inversion looks like a bug in tsan (similar known issue: google/sanitizers#814) and can be easily reproduced with 3 locks by changing the order of the last 2 locks: thread1 locks A, B, C and thread2 locks A, C, B (assuming a reversed unlocking order). Godbolt example here: https://godbolt.org/z/81Y6frYKv (same example with std::scoped_lock https://godbolt.org/z/175x6Yv9h)

I will keep the PR in draft until everything is green and passing in gitlab ci.

@gabriel-bjg gabriel-bjg force-pushed the thread-sanitizers-in-gitlab-ci branch from 5d891a6 to d391967 Compare November 2, 2021 20:31
@PastaPastaPasta
Copy link
Member

I question whether we should retain one run of functional tests that doesn't include any sanitizer so that it is nice and fast..

@gabriel-bjg gabriel-bjg force-pushed the thread-sanitizers-in-gitlab-ci branch from d391967 to 0d6c51b Compare November 3, 2021 17:04
@gabriel-bjg
Copy link
Author

I question whether we should retain one run of functional tests that doesn't include any sanitizer so that it is nice and fast..

I could do that, but we need a target without sanitizers at all and a new job for testing as tsan and ubsan can't run together (see: google/sanitizers#1106). We could use the linux64_cxx17 build target (with thread sanitizer removed from there) and create a new testing job derived from it, so we would have 3 testing jobs: linux32-test (ubsan), linux64-test (tsan), linux64-test-no-sanitizers (we can discuss to improve their namings).

@PastaPastaPasta
Copy link
Member

I question whether we should retain one run of functional tests that doesn't include any sanitizer so that it is nice and fast..

I could do that, but we need a target without sanitizers at all and a new job for testing as tsan and ubsan can't run together (see: google/sanitizers#1106). We could use the linux64_cxx17 build target (with thread sanitizer removed from there) and create a new testing job derived from it, so we would have 3 testing jobs: linux32-test (ubsan), linux64-test (tsan), linux64-test-no-sanitizers (we can discuss to improve their namings).

Yes, I think it makes sense to have the following
linux64 - pure linux64, no sanitizers, unit and functional tests
linux64-thread - linux64, thread sanitizer, unit and functional tests
linux32-undef - linux32, undefined sanitizer, unit and functional tests

@gabriel-bjg gabriel-bjg force-pushed the thread-sanitizers-in-gitlab-ci branch 7 times, most recently from 7462c2f to 12f2c27 Compare November 4, 2021 19:30
@github-actions
Copy link

This pull request has conflicts, please rebase.

@UdjinM6
Copy link

UdjinM6 commented Dec 12, 2021

pls see 3748656 (NOTE: I kept pure linux32 build/tests untouched just in case - they should not slow things down that much and we can always drop them later if needed)

@gabriel-bjg gabriel-bjg force-pushed the thread-sanitizers-in-gitlab-ci branch from 4a1b159 to 55d2233 Compare December 16, 2021 09:21
Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny issue, LGTM otherwise

.gitlab-ci.yml Outdated Show resolved Hide resolved
@gabriel-bjg gabriel-bjg force-pushed the thread-sanitizers-in-gitlab-ci branch 3 times, most recently from 9ba7ec9 to 65a77d1 Compare December 22, 2021 17:00
@gabriel-bjg gabriel-bjg marked this pull request as ready for review December 22, 2021 18:57
UdjinM6
UdjinM6 previously approved these changes Dec 23, 2021
Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

Copy link
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one issue

@@ -178,7 +179,7 @@ class CCoinJoinClientManager
// TODO: or map<denom, CCoinJoinClientSession> ??
std::deque<CCoinJoinClientSession> deqSessions GUARDED_BY(cs_deqsessions);

bool fMixing{false};
std::atomic<bool> fMixing{false};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should update

if (IsMixing()) {
return false;
}
return fMixing = true;
to be actually atomic (likely use compare_exchange)?

@github-actions
Copy link

This pull request has conflicts, please rebase.

@gabriel-bjg gabriel-bjg force-pushed the thread-sanitizers-in-gitlab-ci branch 2 times, most recently from a74691f to c5c0eaf Compare January 5, 2022 09:27
Copy link
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK for squash merge

Copy link
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature_llmq_signing.py failed twice in ci so far..

@gabriel-bjg gabriel-bjg force-pushed the thread-sanitizers-in-gitlab-ci branch from c5c0eaf to 5209f25 Compare January 6, 2022 17:28
@gabriel-bjg gabriel-bjg changed the title Enable tsan on linux32 build Enable tsan on linux64 build Jan 6, 2022
@gabriel-bjg gabriel-bjg force-pushed the thread-sanitizers-in-gitlab-ci branch from 5209f25 to e8cc8e9 Compare January 6, 2022 19:13
@gabriel-bjg
Copy link
Author

feature_llmq_signing.py failed twice in ci so far..

I tried to reproduce locally, but after running the test that's failing for 2 hours in a loop I gave up. I increased the timeout of wait_for_sigs on the failing testcase (https://github.com/dashpay/dash/pull/4563/files#diff-78ac439373346c70f2c3236e1f34e274df611f2f2367bef84464bd0e59226940R106) and it does not fail anymore in the CI.

PastaPastaPasta
PastaPastaPasta previously approved these changes Jan 7, 2022
Copy link
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK for squash merge

restarted ci test run, hopefully it passes

test/functional/feature_llmq_signing.py Outdated Show resolved Hide resolved
Make fMixing atomic as it has concurrent access
Add tsan suppression for zmq namespace
Suppress deadlock false positive in ConnectTip
Switch ubsan target to linux32
Add new test job for linux64_cxx17 target without any sanitizers
Increase rpc time out for block reward reallocation test
Fix heap use after free in CConnman::GetExtraOutboundCount()
Different builds for linux32 and linux64 tsan and ubsan
Increase timeout for llmq_signing functional test
Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

@UdjinM6 UdjinM6 added this to the 18 milestone Jan 10, 2022
Copy link
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK for squash merge

@PastaPastaPasta PastaPastaPasta changed the title Enable tsan on linux64 build ci: Enable tsan on linux64 build Jan 10, 2022
@PastaPastaPasta PastaPastaPasta merged commit f73fce7 into dashpay:develop Jan 10, 2022
gades pushed a commit to cosanta/cosanta-core that referenced this pull request Nov 7, 2023
Make fMixing atomic as it has concurrent access
Add tsan suppression for zmq namespace
Suppress deadlock false positive in ConnectTip
Switch ubsan target to linux32
Add new test job for linux64_cxx17 target without any sanitizers
Increase rpc time out for block reward reallocation test
Fix heap use after free in CConnman::GetExtraOutboundCount()
Different builds for linux32 and linux64 tsan and ubsan
Increase timeout for llmq_signing functional test
gades pushed a commit to piratecash/pirate that referenced this pull request Dec 9, 2023
Make fMixing atomic as it has concurrent access
Add tsan suppression for zmq namespace
Suppress deadlock false positive in ConnectTip
Switch ubsan target to linux32
Add new test job for linux64_cxx17 target without any sanitizers
Increase rpc time out for block reward reallocation test
Fix heap use after free in CConnman::GetExtraOutboundCount()
Different builds for linux32 and linux64 tsan and ubsan
Increase timeout for llmq_signing functional test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants