-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added slice equality when static fastpath. #18910
Conversation
name old time/op new time/op delta |
name old time/op new time/op delta |
name old time/op new time/op delta name old allocs/op new allocs/op delta name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta name old speed new speed delta |
ffbf102
to
05ba7f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good. Just a few comments. Please let me know if you have any questions.
Reviewed 7 of 13 files at r1, 6 of 6 files at r2.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @arjunroy, @hcaseyal, and @soheilhy)
src/core/ext/transport/chttp2/transport/parsing.cc, line 521 at r2 (raw file):
} } else if (grpc_slice_eq_static(GRPC_MDKEY(md), GRPC_MDSTR_GRPC_STATUS) && !grpc_mdelem_eq(md, GRPC_MDELEM_GRPC_STATUS_0)) {
Use grpc_mdelem_static_value_eq()
here? Would this eliminate the need for the if
block above?
src/core/lib/slice/slice_internal.h, line 32 at r2 (raw file):
#include "src/core/lib/transport/static_metadata.h" int grpc_slice_differs_slowpath(const grpc_slice& a,
The semantics of this function are not obvious from the perspective of the slice API. Why is there a slowpath function? When should it be used instead of the normal grpc_slice_eq()
function? And why does it have the opposite boolean semantic (i.e., it returns true if they differ instead of false that they are not equal)?
At minimum, I think this API needs some cleanup and documentation.
src/core/lib/transport/metadata.h, line 139 at r2 (raw file):
* metadata batch callouts in initial/trailing filters. In this case, fastpath * grpc_mdelem_eq and remove unnecessary checks. */ int grpc_slice_differs_slowpath(const grpc_slice& a,
This is declared in slice_internal.h, so we should include that instead of re-declaring it here.
src/core/lib/transport/metadata.h
Outdated
return grpc_slice_eq_static(GRPC_MDVALUE(a), GRPC_MDVALUE(b_static)); | ||
const grpc_slice& a_slice = GRPC_MDVALUE(a); | ||
const grpc_slice& b_slice = GRPC_MDVALUE(b_static); | ||
return a_slice.refcount == b_slice.refcount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just call grpc_slice_eq_static here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this will also tie into a comment that Mark made, regarding why I'm doing a forward declaration as opposed to a an include of slice_internal.h.
Basically, there is a dependency cycle. slice_internal.h includes static_metadata.h which includes metadata.h. So this means we cannot include slice_internal.h here. So we have to forward declare.
Since grpc_slice_eq_static is inlined in its definition in slice_internal.h, just forward declaring means that we'll get a linker error with just a forward declaration.
However, the slow path is not declared inline, so my terrible hack here is to just duplicate a bit of implementation and then call the slow path method which won't throw a linker error.
TLDR: I don't call it because it would fail during the link process, and this is a hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some other way to solve it so we don't have to use the hack? Could we move grpc_slice_eq_static to a different file, then have metadata.h include that file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an option, but it's not clear where it should go. I am open to suggestions, since I am also not a huge fan of what I have done here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @markdroth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make a new file (or files) called slice_utils.h (and maybe a slice_utils.cc)? And it could live in the same directory as slice_internal.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable - added slice_utils.h and removed the forward declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @hcaseyal, @markdroth, and @soheilhy)
src/core/ext/transport/chttp2/transport/parsing.cc, line 521 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Use
grpc_mdelem_static_value_eq()
here? Would this eliminate the need for theif
block above?
Nice catch. Using grpc_mdelem_static_value_eq is appropriate here. It is also true that this makes the above if block not needed anymore.
Previously, if the input md was static, it would set s->seen_error to true if it was either GRPC_MDELEM_GRPC_STATUS_1, or GRPC_MDELEM_GRPC_STATUS_2. If not static, it would set s->seen_error to true if it was a GRPC_STATUS and it was not equal to GRPC_MDELEM_GRPC_STATUS_0.
So in any world where:
- It is a GRPC_STATUS
- That isn't GRPC_MDELEM_GRPC_STATUS_0
We s->set seen_error to true. If we do have a static md and use just the second code branch, assuming GRPC_MDELEM_GRPC_STATUS_0 != [GRPC_MDELEM_GRPC_STATUS_1, GRPC_MDELEM_GRPC_STATUS_2], we will have the same results.
A similar argument applies to the same code in on_initial_header, so I'll change it there too. Will add a comment showing the intent of the code in both places and kill the unnecessary if block.
src/core/lib/slice/slice_internal.h, line 32 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
The semantics of this function are not obvious from the perspective of the slice API. Why is there a slowpath function? When should it be used instead of the normal
grpc_slice_eq()
function? And why does it have the opposite boolean semantic (i.e., it returns true if they differ instead of false that they are not equal)?At minimum, I think this API needs some cleanup and documentation.
I'll add the documentation. Also will describe the reasoning here.
Specifically: the fastest and inline-able case is when the slices are literally equal, for the interned and static cases. The slowpath is every other case.
We grpc_slice_eq_static(interned) when we want to compare two slices, and we know the latter one is static(interned). We use grpc_slice_differs_slowpath when we want to compare two slices and we know the latter one is not inlined.
Note grpc_slice_differs_slowpath is a helper method primarily, perhaps we could rename it to: grpc_refcounted_slice_differs.
Now, why are we using differs() for this helper method? For slightly better code generation.
Suppose we have two slices, and if they're identical, we do some followup operation.
Suppose we have as our slowpath method either grpc_slice_differs_slowpath, or grpc_slice_equals_slowpath. Both are represented here.
If we use grpc_slice_differs_slowpath, we do a tailcall to memcp.
If we use grpc_slice_equals_slowpath, we use bcmp, then perform 6 instructions afterwards. Semantically all we're accomplishing is flipping the output. (ie. return memcmp(); or return !memcmp(); ).
Now, take a look at call_with_equals and call_with_differs. In each case, if we were to take the slowpath, and we were to call cont() afterwards, it's 8 instructions. But, in executing the slowpath, the implementation using grpc_slice_differs_slowpath() would not have to perform the 6 instructions that grpc_slice_equals_slowpath uselessly does. So in the case where we're using the output of grpc_slice_eq_static/interned() for control flow, it's faster to use grpc_slice_differs_slowpath instead of grpc_slice_equals_slowpath.
Anyways, will add some docs.
src/core/lib/transport/metadata.h, line 139 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This is declared in slice_internal.h, so we should include that instead of re-declaring it here.
Unable - circular dep. See the comment I said to Hope earlier: "Basically, there is a dependency cycle. slice_internal.h includes static_metadata.h which includes metadata.h. So this means we cannot include slice_internal.h here. So we have to forward declare.
Since grpc_slice_eq_static is inlined in its definition in slice_internal.h, just forward declaring means that we'll get a linker error with just a forward declaration.
However, the slow path is not declared inline, so my terrible hack here is to just duplicate a bit of implementation and then call the slow path method which won't throw a linker error.
TLDR: I don't call it because it would fail during the link process, and this is a hack."
Since she already responded - let's continue this discussion there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 23 files reviewed, 7 unresolved discussions (waiting on @hcaseyal, @markdroth, and @soheilhy)
src/core/lib/transport/metadata.h, line 139 at r2 (raw file):
Previously, arjunroy (Arjun Roy) wrote…
Unable - circular dep. See the comment I said to Hope earlier: "Basically, there is a dependency cycle. slice_internal.h includes static_metadata.h which includes metadata.h. So this means we cannot include slice_internal.h here. So we have to forward declare.
Since grpc_slice_eq_static is inlined in its definition in slice_internal.h, just forward declaring means that we'll get a linker error with just a forward declaration.
However, the slow path is not declared inline, so my terrible hack here is to just duplicate a bit of implementation and then call the slow path method which won't throw a linker error.
TLDR: I don't call it because it would fail during the link process, and this is a hack."
Since she already responded - let's continue this discussion there.
Resolved per Hope's suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Remaining comments are minor, so I'll go ahead and approve.
Before merging, please have @yashykt look at the chttp2 changes.
Reviewed 1 of 4 files at r3, 19 of 19 files at r4.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @arjunroy, @hcaseyal, and @soheilhy)
src/core/lib/slice/slice_internal.h, line 32 at r2 (raw file):
Previously, arjunroy (Arjun Roy) wrote…
I'll add the documentation. Also will describe the reasoning here.
Specifically: the fastest and inline-able case is when the slices are literally equal, for the interned and static cases. The slowpath is every other case.
We grpc_slice_eq_static(interned) when we want to compare two slices, and we know the latter one is static(interned). We use grpc_slice_differs_slowpath when we want to compare two slices and we know the latter one is not inlined.
Note grpc_slice_differs_slowpath is a helper method primarily, perhaps we could rename it to: grpc_refcounted_slice_differs.
Now, why are we using differs() for this helper method? For slightly better code generation.
Suppose we have two slices, and if they're identical, we do some followup operation.
Suppose we have as our slowpath method either grpc_slice_differs_slowpath, or grpc_slice_equals_slowpath. Both are represented here.If we use grpc_slice_differs_slowpath, we do a tailcall to memcp.
If we use grpc_slice_equals_slowpath, we use bcmp, then perform 6 instructions afterwards. Semantically all we're accomplishing is flipping the output. (ie. return memcmp(); or return !memcmp(); ).Now, take a look at call_with_equals and call_with_differs. In each case, if we were to take the slowpath, and we were to call cont() afterwards, it's 8 instructions. But, in executing the slowpath, the implementation using grpc_slice_differs_slowpath() would not have to perform the 6 instructions that grpc_slice_equals_slowpath uselessly does. So in the case where we're using the output of grpc_slice_eq_static/interned() for control flow, it's faster to use grpc_slice_differs_slowpath instead of grpc_slice_equals_slowpath.
Anyways, will add some docs.
Wow, that's incredibly counter-intuitive. :(
But I appreciate your documenting it. The performance win probably justifies the counter-intuitivity.
src/core/lib/slice/slice_utils.h, line 37 at r4 (raw file):
// operations to invert the result pointlessly. Concretely, we save 6 ops on // x86-64/clang with differs(). int grpc_slice_differs_refcounted(const grpc_slice& a,
These functions should all return bool
.
src/core/lib/slice/slice_utils.h, line 44 at r4 (raw file):
// b_static(interned); b_static(interned) must be a static(interned) slice, // while a can be any slice, static or otherwise, inlined or refcounted. inline int grpc_slice_eq_static(const grpc_slice& a,
These two functions look exactly the same. Maybe just have one copy called something like grpc_slice_refcounted_eq()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chttp2 changes look ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @hcaseyal, @markdroth, and @soheilhy)
src/core/lib/slice/slice_utils.h, line 37 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
These functions should all return
bool
.
The inline methods should; will change it. For the slowpath method, though, in a similar manner as before, codegen is slightly better to leave it as an int (4 instructions saved; c.f. 6 instructions before was why we had differs rather than equals).
src/core/lib/slice/slice_utils.h, line 44 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
These two functions look exactly the same. Maybe just have one copy called something like
grpc_slice_refcounted_eq()
?
I'll change it to grpc_slice_static_interned_eq() since we do not cover the all refcounted slices case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just one nit.
Reviewable status: 15 of 23 files reviewed, 7 unresolved discussions (waiting on @arjunroy, @hcaseyal, and @markdroth)
src/core/lib/slice/slice.cc, line 425 at r5 (raw file):
a_ptr = &a.data.inlined.bytes[0]; } // Lengths differ => they're different.
nit: I'd remove the obvious comments here and below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I have accepts from everyone, I have coalesced the commits into one and will submit once all the tests pass. Thanks!
Reviewable status: 15 of 23 files reviewed, 6 unresolved discussions (waiting on @hcaseyal and @markdroth)
src/core/lib/slice/slice.cc, line 425 at r5 (raw file):
Previously, soheilhy (Soheil Hassas Yeganeh) wrote…
nit: I'd remove the obvious comments here and below.
Done.
The interop tests are broken for everything now, so I'm going to go ahead and push since all other tests pass. |
This change is