Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interfaces of Python GA API #6327

Merged
merged 1 commit into from
Jun 1, 2016
Merged

Conversation

nathanielmanistaatgoogle
Copy link
Member

Lots of minutia to discuss here, but of course the most important question for this is whether or not it has all the growth surfaces that are needed and whether or not future changes are going to be as disruptive as past feature growth was to the alpha and beta APIs.

(Security (mostly) not part of this code review.)

Changes from Beta:

  • Channel construction now takes a single string rather than separate host and port.
  • RPC timeout now optional.
  • Server-side timeout removed - if applications don't trust the RPC caller and want to time out according to their own schedule, they'll have to set their own alarm clock.

Questions:

  • What about Future? The problems with concurrent.futures.Future that motivated grpc.framework.foundation.future.Future haven't been solved. "Maximizing compatibility" with concurrent.futures.Future would mean compounding its design flaw and violating the public API of concurrent.futures - please no? Is it a problem putting our Future interface where it is currently drafted in this API? Should Future appear as a code element of grpc, or should only code elements that we expect to be used in application code be code elements of grpc?
  • The MultiCallable interfaces are how we support asynchronous invocation on the same code element as synchronous invocation, but with the current semantics they're only needed for response-unary RPCs. Should we keep all four anyway with future growth in mind? If we want to support async/await, they'd be the right place, right?
  • Should the Channel have the RPC invocation convenience methods depicted here or is it just enough to have the MultiCallable methods? How much do we expect users to make direct use of Channels? How hard (and/or weird) are we making their lives if we make them write my_response = my_channel.unary_unary(my_method_name)(my_request) instead of my_response = my_channel.blocking_unary_unary(my_method_name, my_request)?
  • How should this API's documentation refer to connectivity and status code integers? Do we just say "go look at connectivity_state.h and status.h to see what the integer returned to you means"?
  • Is it too clever making the raised RpcError also a Call object? Should the RpcError have a Call object instead?
  • Should the required Channel construction parameter be named "target" or "address" or... something else? What exact wording would we like describing it?
  • What about "channel args"? How do we document them? At least one of them is only appropriate for use in tests - are we worried about it being passed via the non-test API?
  • Servicing RPCs requires a thread pool (right?). Should the server's thread pool be required of the application, optional, or entirely obscured? Should the application be allowed to pass thread pool size?
  • How do we want to support generic service of arbitrary RPCs? (This was present in beta; the question isn't coming out of nowhere.) By accepting a function that looks like a stream-stream handler, but that takes the method name as well and operates on bytes in and bytes out?
  • Where in the API do we want to allow applications to pass native strings as opposed to byte strings? (If we want to support my_response = my_stub.my_method(my_request) at the read-eval-print loop, we're going to have to support native strings in at least one place.)
  • Can we do away with the metadata_transformer present in alpha and beta and make security entirely done with the plug-in API? (I suspect yes?)
  • The host is a stub option in the beta API; should it remain a stub option or become a channel option or per-call option?
  • Where in the documentation do we want to say "metadata" and where do we want to say "headers"? What three terms do we want to say where the beta API said "invocation metadata", "initial metadata", and "terminal metadata"?
  • In the beta API there is a CallOptions class that can be passed on RPC invocation and that encapsulates disable_compression, subcall_of, and per-call credentials. Do we want to do away with this aggregate object and just have those values as keyword arguments on each RPC invocation (as is drafted here)? If so, is there any need to worry about every future feature being added as yet another keyword argument and eventually having an unmanageable number?
  • Are "normal termination of the RPC" and "abnormal termination of the RPC" the right terms to use?

@soltanmm and @jtattermusch please also look? Thanks much.

@soltanmm
Copy link
Contributor

status code and connectivity magic numbers

I'm strongly opposed to flattening our magic numbers. We have two separate documented and enforced-by-convention discriminants that are conceptually distinct. They should remain in separate enumerations (although, they perhaps should acquire their values from the Cython layer as those values will always be in sync with the core).

flattening the package structure

I feel like this swings the pendulum a bit too far in the opposite direction in terms of package hierarchy. Yes, before we had to go 5+ package/modules deep to find interface definitions that the user would regularly interact with, and yes that was a problem. However, I think there's a happy balance between an entirely flat hierarchy and what we had before.

The Python community appears to tend towards organizing packages first by concept or idea and then by implementation detail. I wouldn't be against having something like a module for user-facing invocation-side things, and a module for user-facing service-side things, and a module for common things, for example. Without bikeshedding too much quite, I'd like something akin to:

     top-level
        / \
      concepts
      /     \
specialized concepts
    /         \
implementation details
  /             \

where concepts would include the separation of stubs from services from transports and handshakes (all the most basic elements required to get something up and running), specialized concepts would include codes and security plugins...

tl;dr: If someone were to take our package hierarchy and draw any arbitrary path from root to (public) leaf, I'd want that path to describe a useful tutorial for the project.

@ctiller
Copy link
Member

ctiller commented Apr 28, 2016

Some answers/opinions (I haven't yet read code):

  • 'target' is sort-of standardized across stacks for the channel construction string: it's not necessarily a single address
  • at least C++ channel args are wrapped a little bit: common things get a first class interface, uncommon things get to be set via a side channel interface by searching for the string name. That allows us to bubble up documentation to a place that C++ authors will find it.
  • I'd like to see names for status codes exposed in Python.
  • Suspect RpcError should have-a Call
  • There's room for some debate in servicing RPCs, but I suspect some flavor of thread pool is appropriate here.
  • Generic API's: they're definitely needed in C++ land, and they're a useful stand-in for someone getting a first version of a non-proto based stack up - that said, I could be convinced to pass on it for the Python API this round (and add it back when it's important for someone)
  • Suspect yes on metadata_transformer
  • I want to make host a stub option for C++; setting a default for a channel makes sense (and is expressible to core), allowing per-call override isn't crazy but also probably isn't required if there's a stub option
  • Suspect we want to refer to metadata and not headers in all non-wire format documentation

@nathanielmanistaatgoogle
Copy link
Member Author

Lots more here to discuss and review now.

Highlights:

  • Blocking unary-unary invocation is single-threaded and single-batch which should address all the glaring, obvious performance improvements for the use case and further gains would have to be measurement-based.
  • Asynchronous invocation uses a single shared-channel-wide completion queue so N asynchronously-invoked RPCs spawn one thread rather than N (or N times a constant) threads.
  • A started server runs one thread. Servicing RPCs consumes one thread per active RPC, taken from a thread pool supplied by the application.
  • I think the tests cover a lot.
    • Details is neither used nor verified.
    • Custom status codes are neither used nor verified.
    • Metadata is used but not yet verified.
    • I could stand to write some stream-length-zero tests in addition to the stream-length-many tests
    • But other than that...

Questions:

  • Do we (gRPC Python) have to make a call to Core for every call that the application invokes on us? In the current draft if the application invokes a request-unary RPC and passes a request object that fails to serialize we fake having called (and we conform to status code semantics for the case) without ever touching Core. Is this legitimate?
  • The current draft of the server is one-completion-queue-per-server-plus-one-completion-queue-per-active-RPC. I had wanted to go full-on one-completion-queue-per-server, but after the discussion in issue 6597 I now believe it's impossible to write a multithreaded server with a single completion queue (I have an analysis written up elsewhere if it's not obvious why). Am I wrong? If I'm not wrong, am I the only one who cares? If I'm the only one who cares I'll just get over myself.
  • Is there too much code here (@jtattermusch rightly brought this question up with me about the beta implementation a while back)? The __init__.py file can be as long as it needs to be because it's mostly documentation, but are seven hundred lines for _channel.py and six hundred lines for _server.py in line with the weight of the features being added? Are there obvious opportunities for reduction that I've overlooked?
  • GenericHandler and ParticularHandler are terrible names, but for the moment they get the point across, and we don't have anything more proper listed in either glossary. RpcHandler and MethodHandler, maybe?
  • In the server code I saw some memory leaks clear up when I moved the polling on the per-RPC completion queue inside the per-RPC lock. Why was that? Shouldn't any thread be able to poll on any completion queue? (I'm also open to being wrong about what I saw.)
  • Four of my test methods still leak memory, according to what the Core prints at the end of the test run - the ones that use asynchronous invocation (and thus on the client side a completion queue on a different thread) and complete with non-OK status. What could be going on there?
  • My apologies that at the moment the implementation comments are sparse and the specification comments are absent, but what's unexpected in the code? Where have I gotten "creative" in ways that are dangerous or defective?

@ctiller
Copy link
Member

ctiller commented May 16, 2016

Completely legit not to invoke core on serialization failure. You may want
some monitoring hooks eventually.

It's not obvious to me that one can't write a multi threaded server with
one completion queue. I've certainly done so. Suspect a flawed analysis...
let's chat to find the flaw. Likely documentation needs fixing.

Need specifics for memory leaks.

Hope to dig into the code later today.

On Mon, May 16, 2016, 6:09 AM Nathaniel Manista notifications@github.com
wrote:

Lots more here to discuss and review now.

Highlights:

  • Blocking unary-unary invocation is single-threaded and single-batch
    which should address all the glaring, obvious performance improvements for
    the use case and further gains would have to be measurement-based.
  • Asynchronous invocation uses a single shared-channel-wide completion
    queue so N asynchronously-invoked RPCs spawn one thread rather than N (or N
    times a constant) threads.
  • A started server runs one thread. Servicing RPCs consumes one thread
    per active RPC, taken from a thread pool supplied by the application.
  • I think the tests cover a lot.
    • Details is neither used nor verified.
    • Custom status codes are neither used nor verified.
    • Metadata is used but not yet verified.
    • I could stand to write some stream-length-zero tests in addition
      to the stream-length-many tests
    • But other than that...

Questions:

  • Do we (gRPC Python) have to make a call to Core for every call
    that the application invokes on us? In the current draft if the application
    invokes a request-unary RPC and passes a request object that fails to
    serialize we fake having called (and we conform to status code
    semantics for the case
    https://github.com/grpc/grpc/blob/master/doc/statuscodes.md) without
    ever touching Core. Is this legitimate?
  • The current draft of the server is
    one-completion-queue-per-server-plus-one-completion-queue-per-active-RPC. I
    had wanted to go full-on one-completion-queue-per-server, but after the
    discussion in issue 6597 Investigate server shutdown semantics in Cython code and possibly below #6597 I
    now believe it's impossible to write a multithreaded server with a single
    completion queue (I have an analysis written up elsewhere if it's not
    obvious why). Am I wrong? If I'm not wrong, am I the only one who cares? If
    I'm the only one who cares I'll just get over myself.
  • Is there too much code here (@jtattermusch
    https://github.com/jtattermusch rightly brought this question up
    with me about the beta implementation a while back)? The init.py
    file can be as long as it needs to be because it's mostly documentation,
    but are seven hundred lines for _channel.py and six hundred lines for
    _server.py in line with the weight of the features being added? Are
    there obvious opportunities for reduction that I've overlooked?
  • GenericHandler and ParticularHandler are terrible names, but for the
    moment they get the point across, and we don't have anything more proper
    listed in either glossary. RpcHandler and MethodHandler, maybe?
  • In the server code I saw some memory leaks clear up when I moved the
    polling on the per-RPC completion queue inside the per-RPC lock. Why was
    that? Shouldn't any thread be able to poll on any completion queue? (I'm
    also open to being wrong about what I saw.)
  • Four of my test methods still leak memory, according to what the
    Core prints at the end of the test run - the ones that use asynchronous
    invocation (and thus on the client side a completion queue on a different
    thread) and complete with non-OK status. What could be going on there?
  • My apologies that at the moment the implementation comments are
    sparse and the specification comments are absent, but what's unexpected in
    the code? Where have I gotten "creative" in ways that are dangerous or
    defective?


You are receiving this because you were assigned.
Reply to this email directly or view it on GitHub
#6327 (comment)

@nathanielmanistaatgoogle
Copy link
Member Author

Whoops, I left out the "without the server maintaining bookkeeping on what RPCs are active and what tags are due". The Python beta server is another example of a single-completion-queue multithreaded server so my having forgotten the qualification is even more glaring. Here's what I wrote the other day for issue 6597 (but wound up not posting because in the course of trying to prove "I think the workaround is to carry forward the bookkeeping present in the beta server" I went in the plus-completion-queue-per-active-RPC direction instead):

Let's ask a precise question: must a core-wrapping server implemented as one thread looping over a completion queue and application code run in separate threads maintain its own records of what RPCs are in progress and what events are "due" from the completion queue?

I would like the answer to be "no", but the following scenario suggests to me that it might be "yes".

Control Thread Server Loop Thread Application Thread
start server: request call and spawn server loop thread (spawned here) (not running)
call event: spawn application thread to handle call, request receive-close-on-server event for call, and request next call (spawned here)
shut down server
cancel all calls
no-call call event
server shutdown event: break from loop and exit thread
(not running) attempt to read message: request receive-message event and then wait on condition

This leaves the application thread hung waiting for notification on its condition that will never come. Changing the semantics of the server's loop to "call completion queue shutdown when handling the server shutdown tag" just defers the problem:

Control Thread Server Loop Thread Application Thread
start server: request call and spawn server loop thread (spawned here) (not running)
call event: spawn application thread to handle call, request receive-close-on-server event for call, and request next call (spawned here)
shut down server
cancel all calls
no-call call event
server shutdown event: call completion queue shutdown but keep looping to drain completion queue
attempt to read message: request receive-message event and then crash with !cc->shutdown_called

So if the right time to call completion queue shutdown isn't when processing the server shutdown event, when is it? I don't think I can tell without full bookkeeping.

So what seems to me right now to be the case is that I can't rely solely on events coming out of the completion queue to determine whether or not it is safe for the application to take actions like request a read or request sending status because the event alerting the application to cancellation of the RPC is sitting in "completion queue limbo" having been made available to the server loop thread but not yet obtained by the server loop thread.

So I think the workaround is to carry forward the bookkeeping present in the beta server: each RPC keeps track of what events are due for it, and the server maintains a record of all active RPCs and in the course of handling an event related to an RPC "asks" the RPC if the event concludes the RPC and removes the RPC from its records if the event does conclude the RPC, and the server also maintains a record of which of new-RPC and server-shutdown events are due from the queue as well, and the server exits its loop after all of its shutdown, it completion of all RPCs, and its having secured its last new-RPC event and its shutdown event.

@ctiller: what am I missing, and where have I erred?

@nathanielmanistaatgoogle
Copy link
Member Author

As for the memory leaks:

(1) Taking the current draft and removing the @unittest.skip('memory leak!') the test "succeeds" (so far as Python is concerned) but includes

OK
E0516 09:23:23.504743511  777844 tcp_client_posix.c:191]     failed to connect to 'ipv4:127.0.0.1:36297': timeout occurred
E0516 09:23:23.504785540  777844 tcp_client_posix.c:191]     failed to connect to 'ipv6:[::1]:36297': timeout occurred
D0516 09:23:24.505524749  777844 iomgr.c:99]                 Waiting for 4 iomgr objects to be destroyed
D0516 09:23:25.506244438  777844 iomgr.c:99]                 Waiting for 4 iomgr objects to be destroyed
D0516 09:23:26.506978213  777844 iomgr.c:99]                 Waiting for 4 iomgr objects to be destroyed
D0516 09:23:27.507732080  777844 iomgr.c:99]                 Waiting for 4 iomgr objects to be destroyed
D0516 09:23:28.508496078  777844 iomgr.c:99]                 Waiting for 4 iomgr objects to be destroyed
D0516 09:23:29.509252437  777844 iomgr.c:99]                 Waiting for 4 iomgr objects to be destroyed
D0516 09:23:30.509989604  777844 iomgr.c:99]                 Waiting for 4 iomgr objects to be destroyed
D0516 09:23:31.511965050  777844 iomgr.c:99]                 Waiting for 4 iomgr objects to be destroyed
D0516 09:23:32.512671909  777844 iomgr.c:99]                 Waiting for 4 iomgr objects to be destroyed
D0516 09:23:33.513403748  777844 iomgr.c:117]                Failed to free 4 iomgr objects before shutdown deadline: memory leaks are likely
D0516 09:23:33.513431448  777844 iomgr.c:80]                 LEAKED OBJECT: tcp-client:ipv6:[::1]:42765 fd=42 0x7fa13c00e000
D0516 09:23:33.513445883  777844 iomgr.c:80]                 LEAKED OBJECT: tcp-client:ipv6:[::1]:41981 fd=58 0x7fa130005cb0
D0516 09:23:33.513451683  777844 iomgr.c:80]                 LEAKED OBJECT: tcp-client:ipv6:[::1]:40179 fd=74 0x7fa13c0143d0
D0516 09:23:33.513458426  777844 iomgr.c:80]                 LEAKED OBJECT: tcp-client:ipv6:[::1]:46817 fd=90 0x7fa1400215b0
D0516 09:23:33.513543067  777844 metadata.c:235]             WARNING: 1 metadata elements were leaked
D0516 09:23:33.513558144  777844 metadata.c:235]             WARNING: 1 metadata elements were leaked
D0516 09:23:33.513591742  777844 metadata.c:235]             WARNING: 2 metadata elements were leaked
D0516 09:23:33.513600102  777844 metadata.c:235]             WARNING: 1 metadata elements were leaked
D0516 09:23:33.513609920  777844 metadata.c:235]             WARNING: 1 metadata elements were leaked
D0516 09:23:33.513623083  777844 metadata.c:235]             WARNING: 1 metadata elements were leaked
D0516 09:23:33.513631414  777844 metadata.c:235]             WARNING: 1 metadata elements were leaked
D0516 09:23:33.513641220  777844 metadata.c:235]             WARNING: 1 metadata elements were leaked
D0516 09:23:33.513652460  777844 metadata.c:235]             WARNING: 1 metadata elements were leaked
D0516 09:23:33.513661631  777844 metadata.c:235]             WARNING: 1 metadata elements were leaked
D0516 09:23:33.513673933  777844 metadata.c:235]             WARNING: 2 metadata elements were leaked
D0516 09:23:33.513680678  777844 metadata.c:248]             WARNING: 2 metadata strings were leaked
D0516 09:23:33.513685200  777844 metadata.c:252]             LEAKED: FailedStreamRequestFutureUnaryResponse
D0516 09:23:33.513689173  777844 metadata.c:252]             LEAKED: /test/UnaryUnary
D0516 09:23:33.513694222  777844 metadata.c:248]             WARNING: 2 metadata strings were leaked
D0516 09:23:33.513698283  777844 metadata.c:252]             LEAKED: Deadline Exceeded
D0516 09:23:33.513702299  777844 metadata.c:252]             LEAKED: localhost:40179
D0516 09:23:33.513706328  777844 metadata.c:248]             WARNING: 1 metadata strings were leaked
D0516 09:23:33.513710253  777844 metadata.c:252]             LEAKED: FailedUnaryRequestFutureUnaryResponse
D0516 09:23:33.513714532  777844 metadata.c:248]             WARNING: 1 metadata strings were leaked
D0516 09:23:33.513718325  777844 metadata.c:252]             LEAKED: test
D0516 09:23:33.513723226  777844 metadata.c:248]             WARNING: 1 metadata strings were leaked
D0516 09:23:33.513727175  777844 metadata.c:252]             LEAKED: Exception calling application: 
D0516 09:23:33.513731015  777844 metadata.c:248]             WARNING: 1 metadata strings were leaked
D0516 09:23:33.513734940  777844 metadata.c:252]             LEAKED: grpc-c/0.14.0-dev (linux)
D0516 09:23:33.513739118  777844 metadata.c:248]             WARNING: 1 metadata strings were leaked
D0516 09:23:33.513743125  777844 metadata.c:252]             LEAKED: localhost:42765
D0516 09:23:33.513747123  777844 metadata.c:248]             WARNING: 2 metadata strings were leaked
D0516 09:23:33.513751300  777844 metadata.c:252]             LEAKED: 4S
D0516 09:23:33.513755083  777844 metadata.c:252]             LEAKED: localhost:41981
D0516 09:23:33.513758996  777844 metadata.c:248]             WARNING: 1 metadata strings were leaked
D0516 09:23:33.513763186  777844 metadata.c:252]             LEAKED: /test/StreamUnary
D0516 09:23:33.513767521  777844 metadata.c:248]             WARNING: 2 metadata strings were leaked
D0516 09:23:33.513771403  777844 metadata.c:252]             LEAKED: localhost:46817
D0516 09:23:33.513775257  777844 metadata.c:252]             LEAKED: ExpiredStreamRequestFutureUnaryResponse
D0516 09:23:33.513779274  777844 metadata.c:248]             WARNING: 1 metadata strings were leaked
D0516 09:23:33.513783276  777844 metadata.c:252]             LEAKED: ExpiredUnaryRequestFutureUnaryResponse

in its output. This clearly has to do with the initial metadata from the client, but given that I've checked and rechecked and superchecked that the associated tags are getting pulled off their completion queues on each side of the connection, I'm not sure what.

(2) Hmm; I'm having trouble reproducing the other one right now. Should have made a more detailed note...



class UnaryUnaryMultiCallable(six.with_metaclass(abc.ABCMeta)):
"""Affords invoking a unary-unary RPC."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find it really strange that this is exposed at the top-level. This is not something I think the user should be learning about when skimming the interfaces before they become aware of Server. This sort of reasoning applies to all of the *Callables. Maybe even Future.

Things that the user does not have the direct power of creating and managing should not be among the things the user has to learn about first if skimming the code, IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the sentiment. I think I'm leaning the other way, but I'm open to being persuaded to move these code elements to another module.

It comes down to the question of whether we expect this module to be friendlier to read or friendier to use. If we want it friendlier to read, that means leaving the code elements that users are going to directly use in the file and migrating other code elements out to a less prominent place in some other module or modules. But since those other code elements are referenced from the code elements directly used by the users, that means that they still see them (in Returns: and Raises: documentation, at least).

I think I like having everything in this module because it contributes to ease of use, even if that means the text of the module is less friendly for users of the code to read. When you make when you make an asynchronous RPC call, what's the type of the future that you get back? grpc.Future. Not grpc.future.Future or grpc.interfaces.Future. I like buying that benefit at the cost of the module's text being long to read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, to start, I'd say grpc.utility.Future, where grpc.utility would include every interface we don't expect the user to create themselves sans maybe the error types (else we'd just have a heavily populated grpc or interfaces namespace all over again with future, server, blahblah and the user not knowing which things said user cares about, i.e. the root objects said user must self-manage).

If you lean in the current direction, consider comment section headers, because now that we're traveling into long-and-flat-file-structure-land that we've been historically stylistically abhorrent of where there's no easy-to-recognize syntactic difference between code elements of different kinds (especially in this case, e.g. errors vs. contexts vs. callables vs. servers/clients differ only by the natural-language semantics of their identifiers/comments and maybe length for errors), that's what those comments are for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to roll this around while other other comments come to and go from the review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, wrapping this up: I think we've heard quite loudly from our users that the grpc Python package is where they expect every developer-user-facing code element to be found. I can think of two ways to go about satisfying that requirement:

  1. put everything in grpc/__init__.py, as is currently drafted, or
  2. spread code elements out reasonably into files like _future.py, _multi_callables.py, _context.py, and then alias them in grpc/__init__.py, being careful that the names by which they are officially known are the aliases in the grpc namespace and their having been defined in submodules is just an implementation artifact.

I favor the first choice, but I could live with the second, because both deliver the desired developer-user experience of

>>> import grpc
>>> help(grpc)

>>> help(grpc.Future)

>>> help(grpc.Channel)

>>> help(grpc.insecure_channel)

. What's not part of the developer-user experience that we need to deliver is "the text of grpc/__init__.py has to be not too long". Developer-users should be learning gRPC Python from the documentation or maybe from the read-eval-print loop as above.

The length of grpc/__init__.py is only a problem for us who maintain it and for power developer-users who want to change it. If it were an implementation module, I'd feel like it was on the long side, but since it's nearly all non-executable specification text I think it's just fine.

Please insist on breaking it up if you feel strongly, otherwise have I persuaded you at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seriously considered it, and... I think I like section comments even less than factoring the code elements out into private implementation modules and then aliasing them at top-level. :-P

The grpc package will wind up with about twenty-five code elements of which users will directly use about a third. More than I'd like in an implementation module, but I'm okay with it for a top-level API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, comment section headers, unlike re-exports, don't wreck auto-completion tools' already hampered abilities to find doc-strings in a dynamic language, so there's that marginal comparison (and all its simpler forms like grepping and whatever). Meh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An aside to an aside: turns out that our auto-completion is smart enough to propagate the requisite info. Welp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seriously, though - what's wrong with comment section headers? Which of those reasons apply here? Why do you believe the cons outweigh the pros in this scenario?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong personal distaste rooted in the belief that if a body of code is large enough to warrant breaking up it's better to break it up according to the language's mechanism of division (modules, in this case) rather than according to something not at all structural (like section comments).

But I've gotten over it and added section comments. It's far more important that this move forward.

@nathanielmanistaatgoogle nathanielmanistaatgoogle changed the title gRPC Python GA API (Draft) Key gRPC Python GA API improvements May 21, 2016
@nathanielmanistaatgoogle
Copy link
Member Author

I've dropped the DO NOT MERGE label; I think this is mergeable.

I think I've answered all questions asked up to this point; please come at me with more? Lots more? I'd much rather have approval than consent.

Happy to break it up if you'd like as well.

@nathanielmanistaatgoogle
Copy link
Member Author

Oh, and the memory leak question did find an answer: accidental object retention counter to the design of the (Future) class that was doing it. Took less than ten minutes to find once I sat down with a clear head to look at it.

@nathanielmanistaatgoogle
Copy link
Member Author

@ctiller: now would be a good time for that link you promised me about "target" - right now I just have "the target to which to connect" in my specification; that could definitely be fleshed out.

@nathanielmanistaatgoogle
Copy link
Member Author

Quick question about the ServicerContext interface: as currently drafted it has

  • invocation_metadata() which returns the metadata sent from the client,
  • initial_metadata(metadata) which sends the service-side application's metadata to the client, and
  • terminal_metadata(metadata) which sets the service-side application's metadata to be sent to the client when the application's function returns

. What's bothering me is that I don't think invocation_metadata is a term used anywhere else in gRPC. Should these method names be changed to

  • initial_metadata (or client_initial_metadata),
  • send_initial_metadata, and
  • set_trailing_metadata

?

READY = (_cygrpc.ConnectivityState.ready, 'ready')
TRANSIENT_FAILURE = (
_cygrpc.ConnectivityState.transient_failure, 'transient failure')
FATAL_FAILURE = (_cygrpc.ConnectivityState.fatal_failure, 'fatal failure')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name has been changed to 'SHUTDOWN' in documentation but not in the core code: maybe it's time. Happy to try and coordinate a migration pre-GA if we feel it's worth it. WDYT @a11r @ejona86

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been SHUTDOWN since we've been implementing it, but when the document renamed FATAL_FAILURE to SHUTDOWN one spot was missed and that spot was in a table that made it easiest to copy...

I'd be for the rename in principal, but I don't know how much effort is involved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to rename.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to rename.

@jtattermusch
Copy link
Contributor

Needs resolving.

@jtattermusch
Copy link
Contributor

On Wed, Apr 27, 2016 at 9:31 PM, Nathaniel Manista <notifications@github.com

wrote:

Lots of minutia to discuss here, but of course the most important question
for this is whether or not it has all the growth surfaces that are needed
and whether or not future changes are going to be as disruptive as past
feature growth was to the alpha and beta APIs.

(Security (mostly) not part of this code review.)

Changes from Beta:

  • Channel construction now takes a single string rather than separate
    host and port.
  • RPC timeout now optional.
  • Server-side timeout removed - if applications don't trust the RPC
    caller and want to time out according to their own schedule, they'll have
    to set their own alarm clock.

Questions:

  • What about Future
    https://github.com/grpc/grpc/blob/master/src/python/grpcio/grpc/framework/foundation/future.py?
    The problems with concurrent.futures.Future that motivated
    grpc.framework.foundation.future.Future haven't been solved.
    "Maximizing compatibility" with concurrent.futures.Future would mean
    compounding its design flaw and violating the public API of
    concurrent.futures - please no? Is it a problem putting our Future
    interface where it is currently drafted in this API? Should Future
    appear as a code element of grpc, or should only code elements that we
    expect to be used in application code be code elements of grpc?
  • The MultiCallable interfaces are how we support asynchronous
    invocation on the same code element as synchronous invocation, but with the
    current semantics they're only needed for response-unary RPCs. Should we
    keep all four anyway with future growth in mind? If we want to support
    async/await Expose an API integrated with Python's AsyncIO? #6046, they'd be the
    right place, right?
  • Should the Channel have the RPC invocation convenience methods
    depicted here or is it just enough to have the MultiCallable methods? How
    much do we expect users to make direct use of Channels? How hard
    (and/or weird) are we making their lives if we make them write my_response
    = my_channel.unary_unary(my_method_name)(my_request) instead of my_response
    = my_channel.blocking_unary_unary(my_method_name, my_request)?
  • How should this API's documentation refer to connectivity and status
    code integers? Do we just say "go look at connectivity_state.h and status.h
    to see what the integer returned to you means"?

It seems that returning integers to represent enum values and relying on
user doing the conversion is really user-hostile. We really should not do
that.

  • Is it too clever making the raised RpcError also a Call object?
    Should the RpcError have a Call object instead?

Yes it seems counterintuitive. Being able to access a "call" field on
RpcError instance seems reasonable.

  • Should the required Channel construction parameter be named "target"
    or "address" or... something else? What exact wording would we like
    describing it?

I think it's call "target" in the C core api, so let's stick with that.

  • What about "channel args"? How do we document them? At least one of
    them is only appropriate for use in tests - are we worried about it being
    passed via the non-test API?

C# allows passing in the raw channel arguments to allow finetuning some
expert settings that we don't want to expose through the basic API. For
commonly used channel args (and there should be as few as possible, ideally
none), I plan to provide factory methods later.
I think the philosophy is one should be able to set arbitrary channel
arguments if needed, but the API for doing that doesn't really have to be
super-well polished.

  • Servicing RPCs requires a thread pool (right?). Should the server's
    thread pool be required of the application, optional, or entirely obscured?
    Should the application be allowed to pass thread pool size?

For C# I construct a default thread pool and I expose some api for setting
the parameters of it, but I clearly state this api is experimental and for
expert use case only. For performance testing, these knobs can be useful.
We shouldn't publish an official API for these setting before doing some
extensive experiments with what the default values should be and after
making sure that's actually the right set of knobs to expose (it might be
completely different things that matter the most for performance)

  • How do we want to support generic service of arbitrary RPCs? (This
    was present in beta; the question isn't coming out of nowhere.) By
    accepting a function that looks like a stream-stream handler, but that
    takes the method name as well and operates on bytes in and bytes out?

The simplest approach seems to be exposing a stream-stream handler with
bytes in and bytes out. I think we should eventually have good generic
service API, but let's focus on the other APIs more because most of GA
users wouldn't need to use generic methods.

  • Where in the API do we want to allow applications to pass native
    strings as opposed to byte strings? (If we want to support my_response
    = my_stub.my_method(my_request) at the read-eval-print loop, we're
    going to have to support native strings in at least one place.)
  • Can we do away with the metadata_transformer present in alpha and
    beta and make security entirely done with the plug-in API? (I suspect yes?)

Yes, security should be done entirely using the metadata credentials
plug-in. Setting the metadata on the call manually to perform auth is an
antipattern that we really should get rid of.

  • The host is a stub option in the beta API; should it remain a stub
    option or become a channel option or per-call option?

Per stub is the choice that makes the most sense. Optionally, we could
expose a per-call option, but it's definitely not a requirement.

  • Where in the documentation do we want to say "metadata" and where do
    we want to say "headers"? What three terms do we want to say where the beta
    API said "invocation metadata", "initial metadata", and "terminal metadata"?

I think officially the names are initialMetadata and trailingMetadata,
if the distinction of responseInitialMetadata and requestInitialMetadata is
not clear from the API without saying "response" or "request" explicitly,
the long names become pretty annoying.

C# is actually using terms requestHeaders responseHeaders and trailers
(I am not sure if that's 100% compliant with our metadata policy, but I
think those are reasonble names).

  • In the beta API there is a CallOptions class that can be passed on
    RPC invocation and that encapsulates disable_compression, subcall_of, and
    per-call credentials. Do we want to do away with this aggregate object and
    just have those values as keyword arguments on each RPC invocation (as is
    drafted here)? If so, is there any need to worry about every future feature
    being added as yet another keyword argument and eventually having an
    unmanageable number?

I'm sure that sooner of later we'll need to add some new options that we
didn't quite anticipate, so having an object that can encapsulate different
call options seems like a very good idea. Btw, one of the callOptions that
is getting added is "failFast".

here's what's C#'s doing:
There's a CallOptions struct that encapsulates all the existing call
options (and provides a constructor with optional parameters). That way,
adding a new advanced options is guaranteed not to break the public api an
is a purely additive change.

For the most commonly used options (like deadline), there's also an
overload that allows you to pass the option directly as an optional
parameter when invoking the call - that's to prevent users from always
needing to type new CallOptions(deadline: blahblah)...

  • Are "normal termination of the RPC" and "abnormal termination of the
    RPC" the right terms to use?

@soltanmm https://github.com/soltanmm and @jtattermusch

https://github.com/jtattermusch please also look? Thanks much.

You can view, comment on, or merge this pull request online at:

#6327
Commit Summary

  • gRPC Python GA API (Draft)

File Changes

Patch Links:


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#6327

@jtattermusch
Copy link
Contributor

On Tue, May 24, 2016 at 8:48 AM, Nathaniel Manista <notifications@github.com

wrote:

Quick question about the ServicerContext interface: as currently drafted
it has

  • invocation_metadata() which returns the metadata sent from the
    client,
  • initial_metadata(metadata) which sends the service-side
    application's metadata to the client, and
  • terminal_metadata(metadata) which sets the service-side
    application's metadata to be sent to the client when the application's
    function returns

. What's bothering me is that I don't think invocation_metadata is a term
used anywhere else in gRPC. Should these method names be changed to

  • initial_metadata (or client_initial_metadata),
  • send_initial_metadata, and
  • set_trailing_metadata

?

send_initial_metadata and set_trailing_metadata SGTM. initial_metadata
for request initial metadata would work, you can also consider
request_metadata.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#6327 (comment)

@soltanmm
Copy link
Contributor

Gonna strongly nix request_metadata. There's natural language ambiguity for request being an attributive noun vs. a verb in that identifier.

@nathanielmanistaatgoogle
Copy link
Member Author

All outstanding discussion threads addressed and I've broken up the code so that this pull request is just the GA API interfaces. Please everyone review and come forward with any last concerns - it would be nice if this could go in today before the long weekend.

@soltanmm, @jtattermusch: okay, back to invocation_metadata in the ServicerContext object. I was on the request_metadata bandwagon until I saw your nix.

@nathanielmanistaatgoogle
Copy link
Member Author

Also since trailing metadata, details, and status code are fellow-travelling objects, I've changed the ServicerContext method names code and details to set_code and set_details to match the change of trailing_metadata to set_trailing_metadata.

@nathanielmanistaatgoogle nathanielmanistaatgoogle changed the title Key gRPC Python GA API improvements Interfaces of Python GA API May 27, 2016
@nathanielmanistaatgoogle
Copy link
Member Author

One last question for @ctiller, @jtattermusch, and maybe @ejona86: the current draft shows GenericRpcHandler taking a method name as a string and using that as the only means by which to decide whether to handle the method or to defer to some other GenericRpcHandler further down the list - is that sufficient or do we foresee a use case in which GenericRpcHandlers will want to decide whether or not to service an RPC based on peer, metadata, or some other available-when-call-reaches-server values?

@soltanmm
Copy link
Contributor

re GenericRpcHandler: aside from marginally simplified signatures (for which any future addition to would be a breaking change), do we gain anything by not exposing all information conveyed in the client's RPC prologue...?

@nathanielmanistaatgoogle
Copy link
Member Author

@soltanmm: not exposing all information available at call time would save us one interface (call it a ServicerCallDetails?). So it's not that heavy a burden. What I'm looking for from @jtattermusch, @ctiller, and maybe @ejona86 is something along the lines of "of course a generic handler would never need anything more than method name to decide whether or not to service an RPC; most forecast use-cases of generic handlers are of objects that handle every RPC and won't even need the method name to decide" or "of course a generic handler should be able to see not only method name but invocation metadata, peer, timeout, and probably a few other to-be-added-in-the-future values when deciding whether or not to handle an RPC". Or some answer in between; no particular answer to the question will be hard to implement.

I just don't want to proceed without information and implement something different from the other languages that then has us either missing support for use cases that other gRPC languages support or missing simplicity for the use cases that other gRPC languages support because we've got a larger API able to serve complex use cases are unlikely to ever come to pass.

@ejona86
Copy link
Member

ejona86 commented May 31, 2016

the current draft shows GenericRpcHandler taking a method name as a string and using that as the only means by which to decide whether to handle the method or to defer to some other GenericRpcHandler further down the list

@nathanielmanistaatgoogle, in general, routing should be based solely on the the method (which is the path in HTTP) and, if supported, "virtual hosting" based on the :authority (equivalent of HTTP 1's Host header). I would strongly discourage routing on anything but those two pieces of data.

I don't know if/when/how various languages support different message content types for the same service (e.g., a single service supporting both JSON and protobuf encodings), but it's possible that may be implemented as separate handlers that do some conversion into a common format and then call the "real" handler. There's multiple alternatives though.

In Java, we just pass the method name and authority when looking up a handler. For different message content types, in Java we would mostly likely improve our marshaller interface and not touch method lookup/handlers at all.

@ctiller
Copy link
Member

ctiller commented May 31, 2016

+1 to @ejona86's reply.

@nathanielmanistaatgoogle
Copy link
Member Author

  1. How important is it that the :authority be pulled out of the invocation metadata (that is how it emerges from the core, right?) and passed to the GenericRpcHandler? Can we pass the GenericRpcHandler a HandlerCallDetails object that has method and invocation_metadata attributes? Or are we absolutely certain that we'll only ever want to pass those two values that we just pass them individually rather than wrapping them in an expandable type?
  2. Since ↑ is the last ongoing discussion in this review, is someone prepared to LGTM once it is resolved? Who?

@ctiller
Copy link
Member

ctiller commented May 31, 2016

Instances where we would want to expand this are very rare, but exist. One
is on the horizon: is this request idempotent or not?
One is being talked about: is this request allowed to mutate server state
or not?

Both will have implications for how the request should be routed and
handled if we adopt them.

Would advise having some expandable type available in the interface,
although splitting out method and maybe authority as first class is also
possibly worthwhile.

On Tue, May 31, 2016 at 10:56 AM Nathaniel Manista notifications@github.com
wrote:

  1. How important is it that the :authority be pulled out of the
    invocation metadata (that is how it emerges from the core, right?) and
    passed to the GenericRpcHandler? Can we pass the GenericRpcHandler a
    HandlerCallDetails object that has method and invocation_metadata
    attributes? Or are we absolutely certain that we'll only ever want to pass
    those two values that we just pass them individually rather than wrapping
    them in an expandable type?
  2. Since ↑ is the last ongoing discussion in this review, is someone
    prepared to LGTM once it is resolved? Who?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6327 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AJpudQKgvdCu9FQxCf6E88Ndyt3NAb_2ks5qHHZggaJpZM4IRgVV
.

@ctiller ctiller assigned soltanmm and unassigned ctiller May 31, 2016
This method is only safe to call before the server is started.

Args:
generic_rpc_handlers: Some number of GenericRpcHandlers that will be used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to assume this means some kind of iterable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch; fixed.

@nathanielmanistaatgoogle
Copy link
Member Author

HandlerCallDetails interface added with room for future expansion.

Back over to y'all.

@soltanmm
Copy link
Contributor

lgtm

@nathanielmanistaatgoogle
Copy link
Member Author

@jtattermusch: the MacOS failure is beyond the scope of this pull request; please merge or toss back to me with action items.

@jtattermusch jtattermusch merged commit 9d13a6d into grpc:master Jun 1, 2016
@jtattermusch
Copy link
Contributor

The macOS failures seemed unrelated, so I just merged.

Thanks, Nathaniel.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 27, 2019
@lock lock bot unassigned soltanmm Jan 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants