Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSAN/ASAN timeout and QPS rate adjustments #5321

Merged
merged 4 commits into from
Feb 23, 2016
Merged

Conversation

vjpai
Copy link
Member

@vjpai vjpai commented Feb 19, 2016

The goal here is to reduce the likelihood of flakes caused by timeout errors

  1. Adjust slowdown factors
    tsan documentation says 2-20x, so set it at 20x
    asan documentation says 1.2-2.7x, so set it at 3x
    msan documentation says 2-4x, so set it at 4x
    This is now much less optimistic than before
  2. Reactivate tsan tests for qps_test
  3. Set CPU load for qps_openloop_test
  4. Divide qps_openloop_test Poisson rate by the slowdown factor of
    the configuration

tsan documentation says 2-20x, so set it at 20x
asan documentation says 1.2-2.7x, so set it at 3x
msan documentation says 2-4x, so set it at 4x
This is now much less optimistic than before

2. Reactive tsan tests for qps_test
3. Set CPU load for qps_openloop_test
4. Divide qps_openloop_test Poisson rate by the slowdown factor of
   the configuration
@grpc-kokoro
Copy link

Can one of the admins verify this patch?

@grpc-kokoro
Copy link

Can one of the admins verify this patch?

@ctiller
Copy link
Member

ctiller commented Feb 19, 2016

Thanks for looking at this. The worry here is test latency. Indeed the first run of this reports 6.5 hours for the TSAN run. I don't believe that number however: I think something deeper is going on - individual tests don't seem that much slower. I just ran it again to get some unloaded latency numbers - since I'm not sure anyone else is likely to run tests at 4am.

Let's figure out:

  1. What this really does to build latency
  2. Is that addressable

So far I've been playing the game of precariously balancing then via the timeout multiplier simply for expediency (and that it mostly worked).

@ctiller
Copy link
Member

ctiller commented Feb 19, 2016

So it seems that total time taken is sum(individual_test_time), which is of course bogus (we run tests in parallel).

@ctiller
Copy link
Member

ctiller commented Feb 23, 2016

LGTM.

How are you feeling on this one?

@vjpai
Copy link
Member Author

vjpai commented Feb 23, 2016

I think I feel ok with it. 5 seems to be faster than 20 but still not flaking.

vjpai added a commit that referenced this pull request Feb 23, 2016
TSAN/ASAN timeout and QPS rate adjustments
@vjpai vjpai merged commit 953e41a into grpc:master Feb 23, 2016
@vjpai vjpai deleted the openloop branch February 23, 2016 21:11
@lock lock bot locked as resolved and limited conversation to collaborators Jan 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants