ExceptionCodec and multiple references to a single exception object

Hi all, this issue is less of a real 'bug' in Orleans, as you can argue we're using it wrong here. However, it's part of a broader consideration about throwing exceptions from grains, with respect to serialization. I'm looking for some advice and to determine if it's worth expanding the documentation at all.

We've just had an issue caused because we used the `GenerateSerializerAttribute` to generate a codec for one of our custom exceptions, and that same exception object was referenced in two places. The `ExceptionCodec` explicitly has special handling around this for serialization and deserialization:

[Serialization](https://github.com/dotnet/orleans/blob/v7.2.1/src/Orleans.Serialization/ISerializableSerializer/ExceptionCodec.cs#L225-L229):

```c#
// Exceptions are never written as references. This ensures that reference cycles in exceptions are not possible and is a security precaution. 
ReferenceCodec.MarkValueField(writer.Session);
writer.WriteStartObject(fieldIdDelta, expectedType, typeof(ExceptionCodec));
SerializeException(ref writer, value);
writer.WriteEndObject();
```

[Deserialization](https://github.com/dotnet/orleans/blob/v7.2.1/src/Orleans.Serialization/ISerializableSerializer/ExceptionCodec.cs#L290-L294):

```c#
// In order to handle null values.
if (field.WireType == WireType.Reference)
{
    // Discard the result of consuming the reference and always return null.
    // We do not allow exceptions to participate in reference cycles because cycles involving InnerException are not allowed by .NET
    // Exceptions must never form cyclic graphs via their well-known properties/fields (eg, InnerException).
    var _ = ReferenceCodec.ReadReference<Exception, TInput>(ref reader, field);
    return null;
}
```

The type we were serializing was like:

```c#
[GenerateSerializer]
public class BatchTransactionException : Exception
{
    [Id(0)]
    public List<Result> Results { get; }

    public Message(List<Result> results) : base(new AggregateException(results.Select(x => x.Exception)))
    {
        Results = results;
    }
}

[GenerateSerializer]
public class Result
{
    [Id(0)]
    public int MessageOffset { get; }

    [Id(1)]
    public Exception Exception { get; }

    public Result(int messageOffset, Exception exception)
    {
        MessageOffset = messageOffset;
        Exception = exception;
    }
}

[GenerateSerializer]
public class MyException : Exception
{
    ...
}
```

The issue comes when the exception contained in `BatchTransactionException.Results` on a `Result` is our custom exception (although it can be any exception for which serialization code has been generated).

On serialization, when serializing the exception, first the exception is serialized using it's own generated serializer as part of the `InnerException` on the `BatchTransactionException`. As this is using the generated code, it serializes like any other type, and puts in a reference to the object in the serialization context.

Then when it comes to serialize that exception again as part of the `Result`s, it is again using this code-gen serializer, so like any normal type, it recognises that the same object is already serialized (from the serialization context), so inserts a reference to it, rather than reserializing it from scratch.

This is fine, until we come to deserialization. We get to deserializing the second instance of the `MyException`, the one in the `Result`, which has been serialized as a reference to the other place it appears which is in the `BatchTransactionException.InnerException`. However, at this point the deserializer only knows to expect some kind of `Exception`, so it loads up the `ExceptionCodec`, and uses that to deserialze. This then sees that we have serialized a reference, which we don't allow for `Exception`s, so it get's serialized as a `null`. This is completely silent, and hence difficult to spot why this breaks when you've not seen this issue before.

To clarify, the reason we've gone with using the `GenerateSerializerAttribute` for these exceptions is because we need to serialze them fully. We're aware that we can configure the `ExceptionCodec` to run for our own exceptions by adjusting the namespaces it's considered for, however the `ExceptionCodec` is not perfect in that it does not handle extra custom properties. In this case, we have a custom exception that is part of our domain, so we want to be able to serialize it fully like we do with any other type our grain calls can return.

I have a few suggestions for what could be done here. We could:

1. Accept the general restriction of "no more than one reference to a single exception object" in any type we serialize. This avoids the issue, and might be good advice in general (based on the code comment in the `ExceptionCodec`)? For what it's worth in our case we have changed things so that we no longer do this to side-step the issue entirely.
2. Expand the serializer documentation, perhaps with a special section on exceptions. I think it would be good to get some guidance on the best way to approach exception serialization in general, what quirks and edge cases there are (like the above), and how the `ExceptionCodec` works and how you can enable it for more types. I've seen this come up a couple of times in Discord, so I think it's worth having some guidance in the docs. Exceptions are different to other types, in that you can't tell from a signature all the possible exceptions a method could throw, and they can come from third party libraries, so I think there are special considerations that have to be made for them.
3. Modify the code-gen specifically for `Exception` derived types, so that they also get the same behavior with respect to references as the `ExceptionCodec` has. This would allow users to use the code-gen for exceptions and not hit any issues with the `ExceptionCodec`.

If you agree with any of these suggestions, I would be happy to help out if I can.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExceptionCodec and multiple references to a single exception object #8628

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development