Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Placeholder issue for feedback on @defer & @stream proposal #2848

Closed
robrichard opened this issue Nov 16, 2020 · 40 comments
Closed

Placeholder issue for feedback on @defer & @stream proposal #2848

robrichard opened this issue Nov 16, 2020 · 40 comments
Assignees

Comments

@robrichard
Copy link
Contributor

robrichard commented Nov 16, 2020

For each release of GraphQL.js, we are now publishing an accompanying release containing an experimental implementation of the @defer and @stream directives. We are hoping to get community feedback on these releases before the proposal is accepted into the GraphQL specification.

You can use this experimental release of GraphQL.js by adding the following to your project's package.json file.

"graphql": "experimental-stream-defer"

A similar release of express-graphql will be coming soon.

I am creating this issue as a placeholder for anyone who has tried out these experimental packages to leave feedback. Please let us know if everything is working as expected or if you run into any issues. Any suggestions for improvements are also welcome.

References:

Feedback received so far

Call return on iterable when connection closes

content-length in payload should not be required, parsers should use boundary instead

@robrichard robrichard self-assigned this Nov 16, 2020
@danielrearden
Copy link
Contributor

danielrearden commented Nov 18, 2020

@IvanGoncharov It would be helpful if we could cut a new experimental-stream-defer release with the latest changes to the branch (i.e. 163f62a).

@IvanGoncharov
Copy link
Member

@danielrearden Planning to have bug fix release for master today/tomorrow so we can synchronize both.

@tgriesser
Copy link
Contributor

@danielrearden pointed out this thread after I added feedback on the PR.

My question was around whether the graphql / execute functions should be overloaded with the new return type, or if it'd be simpler and maybe more backward compatible to have a new entrypoint graphqlStream supporting the new behavior:

graphql(): Promise<ExecutionResult>
graphqlSync(): ExecutionResult
graphqlStream(): AsyncIterableIterator<ExecutionResult>

Commenting on @danielrearden's followup:

@tgriesser That would require the server implementation to walk the entire operation AST to determine if any @stream or @defer directives were used in the first place, though, right?

I don't think it'd necessarily need to be an ahead of time validation (though that could be a nice feature), I was thinking it'd be similar to executeSync / graphqlSync, where it's just a simple runtime validation after the operation is executed:

graphql-js/src/graphql.js

Lines 155 to 161 in cd273ad

// Assert that the execution was synchronous.
if (isPromise(result)) {
throw new Error('GraphQL execution failed to complete synchronously.');
}
return result;
}

Except in this case it'd be adding the check to the execute / graphql:

export function execute(args: ExecutionArgs): PromiseOrValue<ExecutionResult> {
  const result = executeImpl(args);

  // Assert that the execution wasn't an async iterable.
  if (isAsyncIterable(result)) {
    result.return()
    throw new Error('Streaming GraphQL must be called with executeStream.');
  }

  return result;
}

Mostly came up thinking of situations where I've used the graphql or execute directly outside of a server implementation, and now needing to guard the result from a type standpoint.

@robrichard
Copy link
Contributor Author

robrichard commented Dec 7, 2020

@tgriesser we did discuss a few similar ideas, and at one point we did implement separate experimental graphql/execute functions, but this was scrapped because it required too much code duplication.

I think long term, most server implementations would want to support AsyncIterable results. Adding a separate function with the current return type (Promise<ExecutionResult | AsyncIterable<AsyncExecutionResult>>) would only be beneficial in the short term to reduce breaking changes.

A new function that always returns an AsyncIterable (with just one result when defer/stream is not used) is an interesting idea that was brought up at one of the working group meetings. I'm not opposed to this approach if this is the direction the library maintainers want to take. I think it does adds some overhead as servers will either have to hold on to the first response to see if the iterable closes or send every response as a chunk encoded connection

cc @IvanGoncharov

@jensneuse
Copy link

I've implemented @defer & @stream a while ago and wanted to share my experience.
I fully agree on the approach to apply @defer on inline fragments & @stream on list fields.

list field directive location

I think it would be beneficial if we add a new directive location "LIST_FIELD" to the spec.
This enables validation for IDE's like GraphiQL to make the user alert when they apply the @stream directive on non list fields.

content negotiation

First I think it would make sense to add custom vendor headers for both client and server to indicate if they support defer & stream.

Possible implementations:

The client indicates it is capable of version 2.

Accept: application/vnd.graphql.org+json;version=2

The client supports streaming responses.

Accept: application/json;graphql/streaming

Server response header:

Content-Type: application/json;graphql/streaming

Why do I think this is important?

GraphQL Operations might be re-used by different clients or stored. Therefore it's not always the case that the client who initially wrote the Operation will also send it. By using content negotiation clients and servers can decide if they "upgrade" the request to be a defer/stream request or not.

Another use case is proxies & API gateways. It might not be desirable for an intermediate to ask for a deferred response. With content negotiation, it could still use the same Operation but is capable to disable defer+stream via a header.

If the client & server don't use the headers specified we can assume they are pre defer/stream and disable it by default, making it backwards compatible.

response format

The current RFC proposes a custom format with "path", "data" and "hasNext".
I'd like to propose a simpler model which I think is easier to adopt for client and server implementers.

The first response should be an ordinary GraphQL response object.
The Content-Type header indicates that the response will be streamed.
We don't have to distinguish between @stream and @defer as both are streaming responses.

Subsequent streaming responses will be JSON Patch objects, delimited with two newline characters.
JSON Patch is a standard to update an existing JSON object with one or many Patches.
JSON Patch has implementations in many languages, making adoption for clients & servers easier.

The final response will be followed by three newline characters, indicating the end of the stream.

Using newline characters instead of "hasNext" makes it easy for clients to understand the state of the stream without looking into the payload directly.

@jensneuse
Copy link

@jensneuse thank you so much for the feedback. Just ok the point of accept headers, I think that can be done currently but just saying that we accept "multipart/mixed". Instead of a bespoke graphql vendor mime. What do you think?

Tbh I'm not yet sure what is the best way to represent this.
I just have a feeling that multipart/mixed is too high level and too vague for a client and server to negotiate the protocol they want to use for communication. Why not make it more explicit?

@AlicanC
Copy link

AlicanC commented Dec 11, 2020

Subsequent streaming responses will be JSON Patch objects

Would we ever need any other JSON Patch operation than "add"?

If not, we could just:

Then it would be really easy to generate a JSON Patch from the GQL format:

const oldGqlFormat = {
  data: data,
  path: gqlPath,
  label,
  hasNext,
  errors,
  extensions,
};

const newGqlFormat = {
  value: data,
  path: jsonPointer,
  label,
  hasNext,
  errors,
  extensions,
};

const jsonPatch = [
  {
    op: 'add',
    // We would also be spreading a bunch of other stuff here,
    // but JSON Patch spec says they will be ignored so we should be ok
    ...newGqlFormat,
  },
];

Going full JSON Patch also sounds interesting. Maybe we could also use the "replace" operation for updating existing data in some other proposal.

@jensneuse
Copy link

@AlicanC I'm already doing exactly what you're proposing.
In one case I initially send a field with the value "null".
Then I stream JSON Patches, one of them contains a "replace" directive, pointing to the "null" value.
This makes the client extremely simple and "dumb" because it doesn't have to know anything about the content.
It simply keeps the last version of the JSON response and updates it until the server terminates the connection.

This also works well for adding Errors while resolving deferred/streamed responses.
The server simply has to emit an "add" Operation with an Error Object as value and the path leading to the errors array.

Here's a snippet how a client looks like:

let hasMessages = false;
let lastMessage:any;
source = new EventSource("example.com/graphql");
source.onmessage = e => {
    const nextMessage = JSON.parse(e.data);
    if (lastMessage !== undefined && nextMessage instanceof Array){
        lastMessage = applyPatch(lastMessage, nextMessage).newDocument;
        opts.next(lastMessage);
    } else {
        opts.next(nextMessage);
        lastMessage = nextMessage;
    }
    hasMessages = true;
}

It's based on EventSource is just an implementation detail.
Could be just an ordinary HTTP/2 Stream, multipart, chunked encoding, anything.

Checking if nextMessage is an Array, in this case, is sufficient because JSON Patch Objects are always Arrays whereas the initial GraphQL Response is a JSON Object.

"applyPatch" is from the npm package "fast-json-patch".

I didn't intentionally choose this approach.
It was more me being lazy and I didn't want to implement updating JSON objects myself.
So maybe we can improve this approach or figure out why it's bad.
I just wanted to share my experience here and see if it can help find a good solution.

@jensneuse
Copy link

jensneuse commented Dec 11, 2020

There's one more topic I'd like to talk about and that's about client performance.
While implementing defer, I learned that it's a problem for Javascript clients when you update too often with small chunks.

So while it looks like a working solution what you propose, it doesn't scale under high load.
To test this I've created a high-performance service, written in Golang, to stream updates extremely fast.
By that, I mean thousands of Patches per second.

What happens is that the Javascript client gets completely overwhelmed by the number of Patches to apply.
It's independent if you're using JSON Patch or a custom merge algorithm like you proposed.

The solution was rather simple, I've introduced batching with a default flush interval.
This means, during a window of e.g. 10ms, I collect all JSON Patch Objects into an Array.
Once we hit a maximum amount of Patches (e.g. 100) OR the window limit (e.g. 10ms) we Flush the buffer.
Once the buffer is flushed, we continue resolving until the complete tree is resolved.

If you want to get a bit more context, I've written about it here:
https://wundergraph.com/features/federation

I believe that if you don't address this issue you'll run into problems once you scale streams up.
Luckily, JSON Patch already allows us to batch updates using an Array of Patch Objects and that's why I propose this as a possible solution.

@robrichard
Copy link
Contributor Author

@jensneuse thanks for the feedback! Here's our thoughts on these topics

content-negotiation

GraphQL Operations might be re-used by different clients or stored. Therefore it's not always the case that the client who initially wrote the Operation will also send it.

In this case could the clients make use of the if argument on @defer and @stream? The clients that do not support incremental delivery can pass a false variable.

query SharedQuery($shouldStream: Boolean) {
  myListField(if: $shouldStream, initialCount: 0) {
    name
  }
}

I think this would solve this issue without relying on out-of-band information being passed?

Response format

hasNext

I think there is benefit from hasNext being part of the specified payload instead of deriving it from the underlying transport mechanism. These requests may be used over other transports where inspecting boundaries is not as easy and we wouldn’t want to tie the GraphQL spec to closely to one transport.

Since many GraphQL clients are transport agnostic I think it’s convenient that there is a standard way to determine if a request is still in flight or not, while keeping the transport code fully encapsulated.

JSON Patch

As @AlicanC said, the only JSON patch operation that’s needed is add, but for @defer it is really an object spread. You would need a JSON patch add operation for each field in the deferred fragment.

For example this payload:

{
  "data": {
    "firstName": "Rob",
    "lastName": "Richard",
    "username": "robrichard",
  },
  "path": ["user"],
  "hasNext": false
}

would be equivalent to this JSON patch:

[
  { "op": "add", "path": "/person/firstName", "value": "Rob"},
  { "op": "add", "path": "/person/lastName", "value": "Richard"},
  { "op": "add", "path": "/person/username", "value": "robrichard"},
]

Plus we need the additional fields for label, extensions, and errors.

While json-patch libraries make it easy to apply patches to the response, many GraphQL clients like Relay and Apollo use a normalized store for data. So incoming patch results would not actually be applied to the previous payload, but rather normalized and inserted into the store.

The GraphQL spec already has a definition for referencing response paths, currently used for errors. We are reusing this definition for incremental delivery, so I’d be hesitant to also introduce JSON Pointers into the spec.

Client Performance

Agreed that a flood of payloads can easily overwhelm clients. I think it could be handled client side with debouncing if a large response is potentially expected.

For example - Relay supports returning an array of payloads from the network layer. It will process all of the payloads into the store and only trigger one re-render of the React components.

@jensneuse
Copy link

First, thanks we're having this conversation and being open for feedback.

I understand your reasoning behind having hasNext. Making it transport agnostic makes sense.

Your arguments on JSON Patch also make a lot of sense.
While testing the JSON Patch approach I already determined that the overall response size increased.
With GZIP it's not dramatic but still noticeable.

On client side debouncing:
While it sounds like a solution, it requires implementation in every client.
If possible I'd favour a solution that works without extra client logic.

On content negotiation:
First, I have a feeling that your proposed solution (enabling stream via a variable) makes Queries more verbose than they have to be.
Next, this means intermediaries like proxies must look into the GraphQL Query to determine if it's a streaming request.
This is a very costly operation (parsing/validation/etc.) and makes the implementation of intermediaries a lot more complex.
Additionally, if the client expects a streaming response while an intermediary cannot speak streaming the intermediary would have to rewrite the Query to the origin server.
If content negotiation happens using a Header, the intermediary could change this header when forwarding the request to the origin.

@robrichard
Copy link
Contributor Author

On debouncing:
Maybe this is something that can be investigated separately without blocking this spec from moving forward? I think it could be helpful to get data on different approaches to this problem once @defer and @stream are used by more servers?

On content negotiation:
It's important that clients can depend on defer and stream being honored in their requests. If an intermediate proxy is going to change the behavior, causing all results to be returned synchronously, it can lead to worse performance for the client than if the client choose to instead write multiple queries. We noted in the spec that a complaint GraphQL server is not required to support defer & stream and only servers that do support it should return the defer and stream directives in their schema introspection.

@jensneuse
Copy link

Both answers sound reasonable to me.

What's your opinion on validation and directive locations?

Will there be changes to the spec to make it explicit that @stream is only allowed on list fields?

@robrichard
Copy link
Contributor Author

@jensneuse I’m not sure what’s involved in adding a new directive location. I’ll look into it to see if it’s feasible. If not, I’ll definitely make sure the spec notes an explicit validation rule that forbids @stream on non list fields.

@jensneuse
Copy link

I've put some more thought to the topic.
One thing I feel is problematic with the current design is on code generation.

Referring to this: https://github.com/graphql/graphql-spec/blob/master/rfcs/DeferStream.md#type-generation

Let's have a look at this example Query:

{
  viewer {
    id
    friends(first: 2) @stream(initialCount: 1, label: "friendStream") {
	  id
    }
  }
  ...GroupAdminFragment @defer(label: "groupAdminDefer")
}

fragment GroupAdminFragment {
   managed_groups {
      id
   }
}

The label is used for the client to figure out what parts of the query are still loading.
You can achieve the same effect without requiring the user to provide a label.
It feels like this label can also be auto-generated and understood by clients without being defined by humans.

Second, I'm thinking about a good user experience when generating client code.
Let's imagine a typescript client, the result object might look similar to this:

interface Response {
  viewer: {
    id: number,
    friends: {
      id: number,
    }[]
  },
  managed_groups?: {
    id: number
  }
}

From this model, it's not possible to distinguish between null, undefined or loading.
If you don't specify managed_groups at all, it will be undefined in the JSON.
If you specify managed_groups it might be null because the result is null.
If you specify managed_groups and @defer it, it'll be undefined with the possibility of becoming null once resolved.

I'm thinking of typesafe languages like Java, Go, etc.. It feels like you'd have to make a choice: Either the resulting model is confusing or you'd have to abandon existing tooling for JSON parsing and build custom models. You could then partially parse the JSON and build the model. Depending on the "labels" you'd be able to set loading state for parts of the model. Once more data arrives you can continue updating the Model and replacing the "loading" objects.

Wouldn't it be simpler if the JSON could be just parsed into the model which supports something similar to "loading-unions"?

One example how this could look like in typescript:

interface Response {
  viewer: {
    id: number,
    friends: {
      id: number,
    }[]
  },
  managed_groups?: {
    __state: 'done'
    id: number
  } | { __state: 'defer' }
}

I'm not yet sure if this is a good idea.
However, here are my thoughts:

This JSON can be parsed into the model with a standard JSON parser.
If managed_groups is undefined, it was not requested.
If managed_groups is { __state: 'defer' }, the UI can show a loading spinner.
If managed_groups is null, the result, deferred or not, is null.
If managed_group is { __state: 'done' }, field 'id' is expected to be present.

Typescript supports switching on __state via union type which makes the dev experience pretty good.

What do people think about this?

@robrichard
Copy link
Contributor Author

@jensneuse You are correct that a fragment name, plus the path could be used to uniquely identify payloads for deferred fragment spreads. However, there is the possibility for ambiguity with @stream and inline deferred fragments. For example:

{
  viewer {
    id
    friends(first: 2) @stream(initialCount: 0, label: "friendStream") {
	  id
	  ...@defer(label: "friendNameDefer") {
	  	name
	  }
    }
  }
}

In this case there will be two payloads with a path of ["viewer", "friends", 0], and there are no fragment names that could be used to identify if it was originated from the stream or the defer. For this reason, I think it's best to use the provided label for consistency. We decided with the working group that labels would not be required arguments, but if they are passed, the server must return them in the payload. Client libraries could choose to require or autogenerate them if they are necessary for the client's functionality.

In regards to loading unions, I think different clients will want to represent the loading states in different ways. In Relay, fragments are tied to UI components and separate types are generated for each fragment. Deferred fragments cause the associated component to handle loading states using React's suspense feature, so a type defined loading union is not necessary. Another example is the initial implementation of defer in Apollo, which passed metadata to components to indicate in-flight deferred data.

Type generation is definitely something that gets more complex with @defer. Ideally separate types should be generated for the different deferred fragments. These types should convey the guarantee that multiple fields will arrive together when they are in the same deferred fragment. Clients can then use the label field to determine which fragments have been fulfilled to ensure the application is using the correct types.

Maybe a client that did want to use this type of unions to indicate loading status could do so at runtime? My biggest concern with baking it into the spec is that it too specifically enourages this pattern rather than allowing clients to take advantage of the loading pattern that is most natural for their platform.

@glasser
Copy link
Contributor

glasser commented Mar 4, 2021

I have a similar question to @tgriesser.

If I understand correctly, if you call execute with the current defer/stream implementation and the query contains @defer or @stream, then you'll always get back an async iterator of patches?

But what if you're executing GraphQL in a context where you don't have a transport that supports incremental delivery? It would seem like we need some way to tell graphql-js "I'm only able to handle a single result".

Whether that's a new entry point like executeStream, or a parameter passed in, or something.

Alternatively, perhaps graphql-js can expose an async function that lets you convert an AsyncIterableIterator<AsyncExecutionResult> into an ExecutionResult. So while all callers of execute will need to be aware of the potential iterator return value (a pretty big API change!), they at least will have an easy helper to convert it into a format suitable for non-incremental delivery. (Does this function exist and I haven't found it?)

I'm thinking, for example, of how we'll support this in Apollo Server, which ships with integrations for a bunch of different http libraries. It's quite likely that we won't be able to implement incremental delivery for every single supported http library all at once; it would be great to be able to allow folks to use graphql@16 (or whatever) with Apollo Server even if they use one of the lesser-used http library that doesn't have incremental delivery support.

My main request is that there be some easy way to get a single response from what you get from execute, even if on the inside that's implemented as "create an async iterator which you then iterator over and combine the results". However, having it be opt-in (or at least opt-out) would be nice too, since that makes it easier to write a single library that supports both graphql@15 and graphql@16. (I'm trying to think of how we'd write our code to work with both 15 and 16 if the "combining helper" suggestion is made; I think we'd want to check to see if the return value is an async iterable, and if so only then would we dynamically try to load the "process the iterable into a single result" function?)

@robrichard
Copy link
Contributor Author

@glasser - one clarification on this.

We noted in the spec that "GraphQL implementations that do not support these directives must not make them available via introspection.". Therefore, any query that contains @defer/@stream sent to a server that does not support these directives should result in an error. Quoting another comment from above on the reasoning:

It's important that clients can depend on defer and stream being honored in their requests. If an intermediate proxy is going to change the behavior, causing all results to be returned synchronously, it can lead to worse performance for the client than if the client choose to instead write multiple queries.

That said, I do think that there should be an option to opt in or out to @defer & @stream in graphql-js. We did support that via flags on the Schema constructor in a previous version of the PR. I think it requires some more changes to graphql-js to support specified directives that are not always included in the schema.

@glasser
Copy link
Contributor

glasser commented Mar 4, 2021

@robrichard Ah, I missed that!

If it's possible for a server implementation to introspect the schema and fail fast if the schema supports defer and stream, that seems sufficient.

What this would mean in practice (assuming defer and stream are part of schemas in graphql@16 by default) is that anybody trying to combine graphql@16 with an Apollo Server transport that doesn't support incremental delivery would get a startup error and they'd have to tell graphql@16 to disable defer and stream. This might be a bit of a harsh experience for upgrades but at least it would communicate clearly to folks that a new cool feature is available but their chosen transport doesn't support it.

I might suggest that it would be easiest for the broader ecosystem if graphql@16 required opting your schema in to defer/stream and graphql@17 required opting out.

@yaacovCR
Copy link
Contributor

yaacovCR commented Mar 14, 2021

EDITED...

Hi!

Thanks again @robrichard, @lilianammmatos, @IvanGoncharov , et al, for all their work on defer, stream and graphql generally.

I am trying to upgrade batch execution within graphql-tools schema stitching to take advantage of defer and stream. I came across the following interesting behavior when accessing the AsyncIterableIterator returned by the experimental defer-stream branch. It seems that calling next() multiple times in a single event loop cycle can cause interference between calls and unusual results.

I do not believe this is desirable behavior, although I am not 100% sure. See the below test, which passes, which was unexpected -- at least to me.

describe('batch execution', () => {
  it('should work natively', async () => {
    const schema = makeExecutableSchema({
      typeDefs: `
        type Query {
          field1: String
          field2: String
        }
      `,
      resolvers: {
        Query: {
          field1: () => 'test1',
          field2: () => 'test2',
        },
      },
    });

    const result = await graphql(schema, `{
      ... on Query @defer {
        field1
      }
      ... on Query @defer {
        field2
      }
    }`, undefined, {});

    const initialResult = await (result as AsyncIterableIterator<AsyncExecutor>).next();
    expect(initialResult).toEqual({
      done: false,
      value: {
        data: {},
        hasNext: true,
      },
    });

    let results = await Promise.all([
      (result as AsyncIterableIterator<AsyncExecutionResult>).next(),
      (result as AsyncIterableIterator<AsyncExecutionResult>).next()
    ]);

    expect(results[0]).toEqual({
      done: false,
      value: {
        data: {
          field1: 'test1',
        },
        hasNext: false,
        path: [],
      },
    });
    expect(results[1]).toEqual({
      done: false,
      value: {
        data: {
          field1: 'test1',
        },
        hasNext: false,
        path: [],
      },
    });

    results = await Promise.all([
      (result as AsyncIterableIterator<AsyncExecutionResult>).next(),
      (result as AsyncIterableIterator<AsyncExecutionResult>).next()
    ]);

    expect(results[0]).toEqual({
      done: true,
    });
    expect(results[1]).toEqual({
      done: true,
    });
  });
});

@yaacovCR
Copy link
Contributor

yaacovCR commented Mar 14, 2021

The results I expected were the following:

describe('batch execution', () => {
  it('should work natively', async () => {
    const schema = makeExecutableSchema({
      typeDefs: `
        type Query {
          field1: String
          field2: String
        }
      `,
      resolvers: {
        Query: {
          field1: () => 'test1',
          field2: () => 'test2',
        },
      },
    });

    const result = await graphql(schema, `{
      ... on Query @defer {
        field1
      }
      ... on Query @defer {
        field2
      }
    }`, undefined, {});

    const initialResult = await (result as AsyncIterableIterator<AsyncExecutor>).next();
    expect(initialResult).toEqual({
      done: false,
      value: {
        data: {},
        hasNext: true,
      },
    });

    let results = await Promise.all([
      (result as AsyncIterableIterator<AsyncExecutionResult>).next(),
      (result as AsyncIterableIterator<AsyncExecutionResult>).next()
    ]);

    expect(results[0]).toEqual({
      done: false,
      value: {
        data: {
          field1: 'test1',
        },
        hasNext: true,
        path: [],
      },
    });
    expect(results[1]).toEqual({
      done: false,
      value: {
        data: {
          field2: 'test2',
        },
        hasNext: false,
        path: [],
      },
    });

    results = await Promise.all([
      (result as AsyncIterableIterator<AsyncExecutionResult>).next(),
      (result as AsyncIterableIterator<AsyncExecutionResult>).next()
    ]);

    expect(results[0]).toEqual({
      done: true,
    });
    expect(results[1]).toEqual(undefined);
  });
});

@yaacovCR
Copy link
Contributor

yaacovCR commented Mar 15, 2021

Prior art for converting a set of promises to an AsyncIterableIterator in promise settle order, https://github.com/repeaterjs/repeater/blob/219a0c8faf2c2768d234ecfe8dd21d455a4a98fe/packages/repeater/src/repeater.ts#L687 @brainkim

@brainkim
Copy link

brainkim commented Mar 22, 2021

@yaacovCR
Hi! I’m actually starting a new job at ApolloGraphQL tomorrow and was coincidentally doing research on the current state of GraphQL and saw this issue. My username is brainkim not briankim, though the latter is my real name. I’ll probably be steeping myself in defer/stream stuff soon, as I actually haven’t done much GraphQL work the past couple of years, but I’m glad to help extract any of the repeater code for graphql-js if necessary, just send me an email or ping me here.

@yaacovCR
Copy link
Contributor

Updated pings! Although I guess unnecessary at this point as you found this anyway. Fix in #2975 does not use repeaters, but as I comment over there, I feel like possibly repeaters might be safer than what graphql-js is doing... I defer to the experts, though, I am very new to all this async stuff... in general, I am a big repeater fan and wish it was part of the language, sigh, awaiting version 4, congrats on the new position!

@yaacovCR
Copy link
Contributor

#2975
#2979
#2980

@yaacovCR
Copy link
Contributor

yaacovCR commented Apr 6, 2021

Hey folks! @robrichard @IvanGoncharov any chance these PRs can be merged? They are a blocker for stitching support of defer/stream.

@yaacovCR
Copy link
Contributor

yaacovCR commented Apr 6, 2021

We could use a new experimental release after the merge as well :)

@martinbonnin
Copy link

Codegen-related question/proposal about giving clients some information about what fragments are included or not in the initial response.

The RFC makes @defer optional to support on the server side:

Server can ignore @defer/@stream. This approach allows the GraphQL server to treat @defer and @stream as hints. The server can ignore these directives and include the deferred data in previous responses. This requires clients to be written with the expectation that deferred data could arrive in either its own incrementally delivered response or part of a previously delivered response.

If I understand correctly, given the following query:

{
    hero {
        id
        ...humanDetails @defer
    }
}
fragment humanDetails  on Human {
    name
}

The server can either respond with:

{
  "hero": {
    "__typename": "Human",
    "id": "1001"
  }
}

or

{
  "hero": {
    "__typename": "Human",
    "id": "1001",
    "name": "Luke"
  }
}

Both are valid responses.

In that case, it's not possible to know in advance the shape of the models to use on the client side. The parser will have to try to map the response to either class HumanWithDetails(id, name) or class Human(id)

This example is relatively simple but it can get more complicated if multiple type conditions and/or object types come into play. The parser can have to parse each fragment and for each fragment, potentially recurse into any nested objects. If that fails, rewind and try again for the next fragment. This makes generating such code more difficult and also certainly impacts the runtime parsing performance.

Would it be possible to inform the client about what to expect in the initial response? That could be done with an extra "deferred" property in the response json:

// payload 1: humanDetails deferred
// deferred must be sent before "data"
{
    deferred: ["humanDetails"],
    data: {__typename: "Human", id: 1001},
    hasNext: true
}

And if the fragment is already included:

// payload 1: humanDetails included
{
    data: {__typename: "Human", id: 1001, name: "Luke"},
    hasNext: false
}

If a server doesn't support @defer, it will not send deferred and the client knows that all fragments will be included in the initial response. This way it is transparent to existing servers.

@yaacovCR
Copy link
Contributor

yaacovCR commented Jun 2, 2021

Sorry if this question is terrible. I am confused about the benefit / necessity of the hasNext field. Why must the result stream document within the result knowledge about the stream? That seems to be unnecessarily intertwining the result delivery protocol with the result. Can’t each language simply define what it means for a stream to end?

@yaacovCR
Copy link
Contributor

yaacovCR commented Jun 2, 2021

For instance, each IteratorResult payload in typescript/JavaScript where done is false should carry a value with a hasNext field of true. What additional value is hasNext giving us that done does not provide?

@robrichard
Copy link
Contributor Author

@yaacovCR it gives clients a standardized & transport independent way to understand when the stream ends. It's possible defer & stream may be used over a transport where there may not be connection close events or done indicators. hasNext is also used to differentiate a deferred payload from the previous event or a new initial payload from the next event when defer/stream is used with subscriptions.

@yaacovCR
Copy link
Contributor

yaacovCR commented Jun 3, 2021

In terms of the latter you can use the path for that instead of hasNext, as initial result will not have path set and there is requirement for initial result to be sent first

in terms of former, I think a termination/separator payload would make more sense instead of the constant hasNexting. I also am having difficulty understanding why the transport should not bear the burden of figuring that out. We are preserving flexibility I guess, but is it really realistic that the transport could not come up with its own separator?

@yaacovCR
Copy link
Contributor

yaacovCR commented Jun 3, 2021

Also, hasNext is not reliable, from spec:

The GraphQL server may determine there are no more values in the event stream after a previous value with hasNext equal to true has been emitted. In this case the last value in the event stream should be a map without data, label, and path entries, and a hasNext entry with a value of false.

@yaacovCR
Copy link
Contributor

yaacovCR commented Jun 3, 2021

I know this just came up in the WG, but I am afraid I still don't get it -- so sad. Why is hasNext: true necessary for incremental delivery of queries, but not for subscriptions, which each have multiple payloads being delivered over http? @robrichard @mjmahone @michaelstaib -- can't we define this at the transport level? wouldn't that be a better equivalence between defer with queries and subscriptions?

@yaacovCR
Copy link
Contributor

yaacovCR commented Jun 4, 2021

To store some gained knowledge -- when embedding incremental delivery of results within a subscription stream, if clients had to rely on the resetting of the path to separate results, they would not be informed that a result had ended until the next result begins. So, at least in terms of subscriptions, you need a separator or marker. This -- in theory -- is a subscriptions only issue, but there are existing clients that have transports possibly without separators (looking at you, Relay, that does not rely on a specified transport), and so to maintain a kind of backwards compatibility, even though we are in early days, it makes sense to have a separator or marker.

I think instead of the constant hasNexting, we should switch to an isFinal: true approach, but that could indeed cause some problems if an incremental-delivery-capable client attempted to communicate with an incremental-delivery-non-capable server.

My takeaway is that in a future world where all GraphQL servers are incermental delivery capable, we should switch back to isFinal: true

many thanks to @robrichard @mjmahone @michaelstaib @brainkim, especially @robrichard for all the cognitive help on this.

@yaacovCR
Copy link
Contributor

yaacovCR commented Jun 8, 2021

@briankim @robrichard @michaelstaib

I believe relevant to discussion at last working group, see #3165, which demonstrates that stream items may be sent "out of order" -- at least in terms of list index, and a naive way of fixing this.

I took another look at the spec edits, and --working through it -- it seems that although for @defer, the execution algorithm seems to implicitly ensures that subfields will not be delivered for an object whose parent field has not been delivered, the same cannot be said for @stream records with respect to the order of list items, as the payloads are explicitly sent in order of completion, which is not surprisingly what the implementation does. If "later" (by index) stream records are completed earlier, they will be sent earlier.

Although we could change the implementation to hold onto later (in terms of index) stream records until the earlier ones have been sent, as the above linked PR does, (a) the naïve implementation from the above PR holds off on starting completion of these later stream records until the earlier ones finish completion, which may not be what we want, and (b) we would have to -- at least, I think -- update the algorithm in the spec edits, or at least make the behavior explicit.

Another option would be to do the reverse, explicitly modify the spec to drop the expectation that stream records will be sent in order, and leave it to the client to decide what to do. I don't think that makes sense.

@yaacovCR
Copy link
Contributor

yaacovCR commented Jun 8, 2021

The final option (of course?) is to work on a less-naive implementation that completes stream records as soon as possible, but holds off on sending them until the earlier ones (by index) have been completed. This would probably be best -- in my opinion.

I am thinking about a separate map of subsequentPayloads for each iterator...

@yaacovCR
Copy link
Contributor

yaacovCR commented Jun 8, 2021

Actually, I changed my mind, I think the existing sometimes unintuitive behavior, with letting the client decide what to do with “out of order” stream items is the best option. If the client receives the indexed item “n” that means that the client can rely on receiving well defined execution results for items 1..n-1. They may contain errors only, but they should arrive, as the underlying iterator has returned successfully and completion has begun. Thoughts?

@yaacovCR
Copy link
Contributor

Any thoughts on this today from anyone else? I hope we can discuss this asynchronously rather than waiting until next working group.

@robrichard
Copy link
Contributor Author

Closing this issue now, using github discussions on this repo going forward https://github.com/robrichard/defer-stream-wg/discussions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants