-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solving crashes in async tests #920
Conversation
a proper join. Also had been misusing EXPECT_EQ, as well as actually having an invalid expectation on the ok field. Now it should be sane.
buf->FillOps(ops, &nops); | ||
GPR_ASSERT(GRPC_CALL_OK == | ||
grpc_call_start_batch(call->call(), ops, nops, buf)); | ||
if (call->call()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what conditions can we get here?
This seems very dangerous: we'll have someone waiting on a completion that will never arrive, transforming a crash into a stuck thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are getting here in the case reported in issue #914. The thread is shutting down, so I don't think that there's a stuckness issue. Perhaps there is a better solution to the problem. I didn't provide a code-snippet there because the original questioner's code might be mixed with internal code, but I can sanitize it if you'd like to see it reported that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear - I don't believe there's a stuckness issue in the case that you're trying to fix.
I do believe that this fix opens the way for a stuckness issue in the future, as we now silently skip doing something but make an implicit promise that an event will be delivered.
Is there a point higher in the call graph that this check could be made?
Is the code validly broken and we should just issue a better error message?
Happy to sit down and look at this once we're both at the office if you'd like.
Sounds good. I too worry that the mere fact that we've entered this code indicates that something horribly wrong has happened elsewhere. I missed my usual bus but am on a bus now and should be in around 10. |
LGTM. Will merge once the tests turn green. |
Avoid thread safety issues on destructor with a proper join.
Also had been misusing EXPECT_EQ, as well as actually having an invalid
expectation on the ok field. Now it should be sane. This no longer causes a
segfault or abort trap for me.