Skip to content

JsonSerializerOptions.WriteIndividualObjects  #38344

Closed
@juliusfriedman

Description

Background and Motivation

Streaming large JSON data to a client sometimes requires the use of streaming the response to get the ideal latency during loading.

E.g. one usually knows the entire data size and their position in that buffer and can typically easily display a progress element client side during such streaming.

This is especially hard when you have an object which has nested objects (which unfortunately depending on your settings of course) TimeSpan is serialized as a nested object, consider:

public class MyModel{
public string Name {get; set;} = "Whatever";
public int Age {get; set;} = 777;
public DateTime Birthday {get; set;} = new DateTime(1986, 11, 12);
public TimeSpan SomeDuration {get; set;} = TimeSpan.Zero;
}

Resulting Json would be:

{
"name": "Whatever",
"age": 777,
"birthday": "1986-11-12T00:00:00.0000000",
"someDuration:{"ticks":0,"days":0,"hours":0,"milliseconds":0,"minutes":0,"seconds":0,"totalDays":0,"totalHours":0,"totalMilliseconds":0,"totalMinutes":0,"totalSeconds":0}
}

And you can see the potential for the splitting of the duration to be greater than that of anything else in my model in this case; and usually what occurs is that part of the key is written at then end of the stream i.e. tot or total or totalS etc which means I have to go back and find whatever started my object (can be really hard depending on the nesting) delimit /slice there, attempt to partially parse what I do have (if anything) and then wait for the rest to proceed.

Proposed API

namespace System.Text.Json
{
    public sealed class JsonSerializerOptions .. {
+    public bool WriteIndividualObjects {get; init;}
     }

Usage Examples

I have a list of 1 - N objects (Or just 1 really large object)

I want to continue to let ASP.Net handle response serialization and I don't want to manually write a JsonConverter to have the semantics I want for each individual type.

I want the JsonSerializer to respect WriteIndividualObjects and although it cannot be guaranteed that the client will have the adequate buffer to receive such data in a single receive I will only write in those chunks specified.

The goal is to avoid when chunking the data that the receiver has to deal with partial objects if at all possibe:

Extreme Case

[{...................... | Chunk (partial)
....},{...........}| Chunk (Split, End and Start Partial)
,{.......................| Chunk (partial)
.}] |EOS

Typical Case

[{......},{.........},{....... | Chunk (2 objects 1 Partial)
....},{...........}| Chunk (Split, End and Start Partial)
,{.......................| Chunk (partial)
.}] |EOS

I would like to avoid the split JSON object (especially keys) in-between chunks if at all possible.

I realize this is highly dependent on the clients receive buffer among other things however if there was at least a strategy which did not involve manually having to write each object into the response stream then I would be satisfied.

After setting proposed property I would expect the writes / flushes to the stream to occur at object boundaries so it was less likely I would have to deal with nested object graphs (yes I know if the object is very large it's hard to avoid this).

Result Case

[{......},| Start of list and 1 complete object and delimiter
{.........},| 1 Complete object and delimiter
{...........},| 1 Complete object and delimiter
{...........},| 1 Complete object and delimiter
{.....................}]|EOS 1 Complete object and end of list

Perhaps the start and end of list should also be small writes to allow for even easier parsing. (Especially on fast connections), i.e.:

[|Start of list
{......},| 1 complete object and delimiter
{.........},| 1 Complete object and delimiter
{...........},| 1 Complete object and delimiter
{...........},| 1 Complete object and delimiter
{.....................} 1 Complete object (user knows end of list must be next)
]|end of list

Alternative Designs

namespace System.Text.Json
{
    public sealed class JsonSerializerOptions .. {
+    public int FlushSize {get; init;} //char([]) /Span<char>FlushMarker(s) etc
     }

As it seems to compliment nicely DefaultBufferSize, which in theory I would then set to something like 8192 and the FlushSize to 4096 but that still can't guarantee that I write a chunk that contains invalid JSON i.e.: {.......,"SomeKe, y": null}

Consider Also

namespace System.Text.Json
{
    public sealed class JsonSerializerOptions .. {
+    public void OnStartWrite(); //Called when a single complete object or primitive was being written to the underlying writer
+    public void OnEndWrite(); //Called when a single complete object or primitive was completely written to the underlying writer
+    public void OnStartRead(); //Called when a single complete object or primitive was being read to the underlying writer
+    public void OnEndRead(); //Called when a single complete object or primitive was completely read to the underlying writer
     }

Would allow for manually flushing of the Stream underlying or whatever other action might be required.

Possibly in addition to although could technically be used instead of the aforementioned but would require sizes to be known in advance and is probably useful in it's own regard.

One could also use Web Sockets / SignalR though that requires somewhat of a paradigm shift in Api and overhead for running the end point.

One could also use a client side parser such as oboejs which provides a variety of options and is very small.

I Believe there are also several variations of this approach, most of which are spec compliant e.g.:

  1. Using NewLine to delimit chunks of complete data only {....}\r\n or [\r\n{....}\r\n,\r\n{....}\r\n,\r\n]\r\n
  2. Using NewLine to delimit all pieces of an object e.g. WriteIndented (Just need to ensure not emitting a partial object key or value, if in a large graph stop at the last valid key which can fit in the buffer and start the next write at that key)
  3. Using a special object to delimit completed data [{...},true,{....}] etc (true in this case is the delimiter, still error prone as the true delimiter maybe split currently [no matter what you choose]) without changes

Risks

Low, depending on design choice, OnWrite methods could get chatty very quickly.

Benefits

Using fetch and ReadableStreams one can then attempt to use this setting such that they can avoid complex parsing logic which must deal with incomplete JSON data while streaming.

Other notes

It maybe beneficial to control both the Tab and NewLine and other characters emitted by the Reader and Writers through the options.

Would not be opposed to FlushIndividualObjects as the name as that is perhaps more inline with the actual semantics of it's operation.

Would not be opposed to FlushDelimiters as the name if that approach was taken.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions