[17.6] Cache source generator node tables only if their state counts match #66992

jjonescz · 2023-02-22T13:09:37Z

Devdiv issue for 17.6: https://devdiv.visualstudio.com/DevDiv/_workitems/edit/1786293

src/Compilers/CSharp/Test/Semantic/SourceGeneration/StateTableTests.cs

chsienki · 2023-03-31T20:29:39Z

I think this fix is actually just masking an underlying issue. When the batch node gets the input table its being listed as cached even though it has changed. That in turn causes the issue that's being addressed here. If we fix the input table issue, I don't think we need to make this fix.

Looking at the input table, what's happening is that we're essentially shrinking the number of items from the previous generation pass, but the items that remain are (correctly) listed as cached. When we create the table from the builder we set IsCached based on the sum of the elements: https://github.com/dotnet/roslyn/blob/main/src/Compilers/Core/Portable/SourceGeneration/Nodes/NodeStateTable.cs#L73

In this case, that's not correct. Yes all the items in the table are cached, but the table itself shouldn't be considered cached, as it has less items than last time. I think we should track in the builder if we can consider the table to be cached or not. The only time we can say a table from a builder (as opposed to one we explicitly call AsCached on) can be cached is if all the elements are cached, and it has the same number as the previous table.

Interestingly this is almost exactly the same as the IsRemoved bug we previously fixed and has basically the same fix.

The following test is a fairly simple repro that demonstrates this behavior outside of the batch node case:

[Fact, WorkItem(61162, "https://github.com/dotnet/roslyn/issues/61162")]
public void IncrementalGenerator_Table_Smaller_Second_TimeAround()
{
    var items = ImmutableArray.Create(new[] { "a", "b", "c" });
    var generator = new IncrementalGeneratorWrapper(new PipelineCallbackGenerator(ctx =>
    {
        var transform1 = ctx.CompilationProvider.SelectMany((c, _) => items);
        var transform2 = transform1.Select((i, _) => i);

        ctx.RegisterSourceOutput(transform2, static (spc, i) =>
        {
            spc.AddSource(i, $@"// {i}");
        });
    }));

    var source = "System.Console.WriteLine();";
    var parseOptions = TestOptions.RegularPreview;
    Compilation compilation = CreateCompilation(source, options: TestOptions.DebugExeThrowing, parseOptions: parseOptions);

    GeneratorDriver driver = CSharpGeneratorDriver.Create(new[] { generator }, parseOptions: parseOptions);
    verify(ref driver, compilation, new[] {
        "// a",
        "// b",
        "// c"
        });

    items = ImmutableArray.Create(new[] { "a", "b" });

    verify(ref driver, compilation, new[] {
        "// a",
        "// b",
        });

    static void verify(ref GeneratorDriver driver, Compilation compilation, string[] generatedContent)
    {
        driver = driver.RunGeneratorsAndUpdateCompilation(compilation, out var outputCompilation, out var generatorDiagnostics);
        outputCompilation.VerifyDiagnostics();
        generatorDiagnostics.Verify();
        var trees = driver.GetRunResult().GeneratedTrees;
        Assert.Equal(generatedContent.Length, trees.Length);
        for(int i = 0; i < trees.Length; i++)
        {
            AssertEx.EqualOrDiff(generatedContent[i], trees[i].ToString());
        }
    }
}

This will fail right now, as transform2 will end up returning three items the second time through, when it should only return two.

chsienki

As per comment.

This reverts commit 2984b5f.

jjonescz · 2023-04-03T09:47:36Z

Thanks @chsienki for the investigation. I couldn't reproduce a scenario where my fix would fail. Your test fails simply because you never update the compilation hence the generator doesn't run the second time. This change makes the test pass (even without the fix in this PR):

     GeneratorDriver driver = CSharpGeneratorDriver.Create(new[] { generator }, parseOptions: parseOptions);
-    verify(ref driver, compilation, new[] {
+    verify(ref driver, ref compilation, new[] {
         "// a",
         "// b",
         "// c"
         });

     items = ImmutableArray.Create(new[] { "a", "b" });

-    verify(ref driver, compilation, new[] {
+    verify(ref driver, ref compilation, new[] {
         "// a",
         "// b",
         });

-    static void verify(ref GeneratorDriver driver, Compilation compilation, string[] generatedContent)
+    static void verify(ref GeneratorDriver driver, ref Compilation compilation, string[] generatedContent)
     {
-        driver = driver.RunGeneratorsAndUpdateCompilation(compilation, out var outputCompilation, out var generatorDiagnostics);
-        outputCompilation.VerifyDiagnostics();
+        driver = driver.RunGeneratorsAndUpdateCompilation(compilation, out compilation, out var generatorDiagnostics);
+        compilation.VerifyDiagnostics();
         generatorDiagnostics.Verify();
         var trees = driver.GetRunResult().GeneratedTrees;
         Assert.Equal(generatedContent.Length, trees.Length);
         for(int i = 0; i < trees.Length; i++)
         {
             AssertEx.EqualOrDiff(generatedContent[i], trees[i].ToString());
         }
     }

However, your suggestion makes sense, I rewrote the fix.

chsienki · 2023-04-04T00:25:03Z

🤦 Yep, you're right about that.

Here's a test that *does* fail with the original fix

 [Fact, WorkItem(61162, "https://github.com/dotnet/roslyn/issues/61162")]
        public void IncrementalGenerator_Collect_SyntaxProvider2()
        {
            var generator = new IncrementalGeneratorWrapper(new PipelineCallbackGenerator(static ctx =>
            {
                var invokedMethodsProvider = ctx.SyntaxProvider
                    .CreateSyntaxProvider(
                        static (node, _) => node is InvocationExpressionSyntax,
                        static (ctx, ct) => ctx.SemanticModel.GetSymbolInfo(ctx.Node, ct).Symbol?.Name ?? "(method not found)")
                    .Select((n, _) => n);

                ctx.RegisterSourceOutput(invokedMethodsProvider, static (spc, invokedMethod) =>
                {
                    spc.AddSource(invokedMethod, "// " + invokedMethod);
                });
            }));

            var source = """
                System.Console.WriteLine();
                System.Console.ReadLine();
                """;
            var parseOptions = TestOptions.RegularPreview;
            Compilation compilation = CreateCompilation(source, options: TestOptions.DebugExeThrowing, parseOptions: parseOptions);

            GeneratorDriver driver = CSharpGeneratorDriver.Create(new[] { generator }, parseOptions: parseOptions);
            verify(ref driver, compilation, new[]
            {
                "// WriteLine",
                "// ReadLine"
            });

            replace(ref compilation, parseOptions, """
                System.Console.WriteLine();
                """);

            verify(ref driver, compilation, new[]
            {
                "// WriteLine"
            });

            replace(ref compilation, parseOptions, "_ = 0;");
            verify(ref driver, compilation, Array.Empty<string>());

            static void verify(ref GeneratorDriver driver, Compilation compilation, string[] generatedContent)
            {
                driver = driver.RunGeneratorsAndUpdateCompilation(compilation, out var outputCompilation, out var generatorDiagnostics);
                outputCompilation.VerifyDiagnostics();
                generatorDiagnostics.Verify();
                var trees = driver.GetRunResult().GeneratedTrees;
                Assert.Equal(generatedContent.Length, trees.Length);
                for (int i = 0; i < generatedContent.Length; i++)
                {
                    AssertEx.EqualOrDiff(generatedContent[i], trees[i].ToString());
                }
            }

            static void replace(ref Compilation compilation, CSharpParseOptions parseOptions, string source)
            {
                compilation = compilation.ReplaceSyntaxTree(compilation.SyntaxTrees.Single(), CSharpSyntaxTree.ParseText(source, parseOptions));
            }
        }

The new proposed fix does make that test pass, but I'm no longer convinced its actually the correct thing to do.

I was originally thinking this was an issue that would affect any node that removed tail entries but realize now that it doesn't because we track the removed entries. So remains the question as to why we're getting a cached result as the input when it shouldn't be?

Looking at it I think the bug is actually in the SyntaxNodeProvider. When we construct the filter table, we will modify any existing items, but don't carry over the removed ones.

https://github.com/dotnet/roslyn/blob/main/src/Compilers/Core/Portable/SourceGeneration/Nodes/PredicateSyntaxStrategy.cs#L101-L104

This surfaces as a correctness problem when the removed items are at the end of the table, but it also surfaces as a perf issue when the removed items are in the middle of the table.

Consider the following test

        [Fact, WorkItem(61162, "https://github.com/dotnet/roslyn/issues/61162")]
        public void IncrementalGenerator_Collect_SyntaxProvider3()
        {
            var generator = new IncrementalGeneratorWrapper(new PipelineCallbackGenerator(static ctx =>
            {
                var invokedMethodsProvider = ctx.SyntaxProvider
                    .CreateSyntaxProvider(
                        static (node, _) => node is InvocationExpressionSyntax,
                        static (ctx, ct) => ctx.SemanticModel.GetSymbolInfo(ctx.Node, ct).Symbol?.Name ?? "(method not found)")
                    .Select((n, _) => n)
                    .WithTrackingName("Select");

                ctx.RegisterSourceOutput(invokedMethodsProvider, static (spc, invokedMethod) =>
                {
                    spc.AddSource(invokedMethod, "// " + invokedMethod);
                });
            }));
            var source1 = """
                System.Console.WriteLine();
                System.Console.ReadLine();
                """;

            var source2 = """
                class C {
                    public void M()
                    {
                        System.Console.Clear();
                        System.Console.Beep();
                    }
                }
                """;

            var parseOptions = TestOptions.RegularPreview;
            Compilation compilation = CreateCompilation(new[] { source1, source2 }, options: TestOptions.DebugExeThrowing, parseOptions: parseOptions);

            GeneratorDriver driver = CSharpGeneratorDriver.Create(new[] { generator }, parseOptions: parseOptions, driverOptions: new GeneratorDriverOptions(IncrementalGeneratorOutputKind.None, trackIncrementalGeneratorSteps: true));
            verify(ref driver, compilation, new[]
            {
                "// WriteLine",
                "// ReadLine",
                "// Clear",
                "// Beep"
            });

            // edit part of source 1
            replace(ref compilation, parseOptions, """
                System.Console.WriteLine();
                System.Console.Write(' ');
                """);

            verify(ref driver, compilation, new[]
            {
                "// WriteLine",
                "// Write",
                "// Clear",
                "// Beep"
            });

            Assert.Collection(driver.GetRunResult().Results[0].TrackedSteps["Select"],
                (r) => Assert.Equal(IncrementalStepRunReason.Cached, r.Outputs.Single().Reason),
                (r) => Assert.Equal(IncrementalStepRunReason.Modified, r.Outputs.Single().Reason),
                (r) => Assert.Equal(IncrementalStepRunReason.Cached, r.Outputs.Single().Reason),
                (r) => Assert.Equal(IncrementalStepRunReason.Cached, r.Outputs.Single().Reason)
                );

            // remove second line of source 1
            replace(ref compilation, parseOptions, """
                System.Console.WriteLine();
                """);

            verify(ref driver, compilation, new[]
            {
                "// WriteLine",
                "// Clear",
                "// Beep"
            });

            Assert.Collection(driver.GetRunResult().Results[0].TrackedSteps["Select"],
                (r) => Assert.Equal(IncrementalStepRunReason.Cached, r.Outputs.Single().Reason),
                (r) => Assert.Equal(IncrementalStepRunReason.Removed, r.Outputs.Single().Reason),
                (r) => Assert.Equal(IncrementalStepRunReason.Cached, r.Outputs.Single().Reason),
                (r) => Assert.Equal(IncrementalStepRunReason.Cached, r.Outputs.Single().Reason)
                );

            static void verify(ref GeneratorDriver driver, Compilation compilation, string[] generatedContent)
            {
                driver = driver.RunGeneratorsAndUpdateCompilation(compilation, out var outputCompilation, out var generatorDiagnostics);
                outputCompilation.VerifyDiagnostics();
                generatorDiagnostics.Verify();
                var trees = driver.GetRunResult().GeneratedTrees;
                Assert.Equal(generatedContent.Length, trees.Length);
                for (int i = 0; i < generatedContent.Length; i++)
                {
                    AssertEx.EqualOrDiff(generatedContent[i], trees[i].ToString());
                }
            }

            static void replace(ref Compilation compilation, CSharpParseOptions parseOptions, string source)
            {
                compilation = compilation.ReplaceSyntaxTree(compilation.SyntaxTrees.First(), CSharpSyntaxTree.ParseText(source, parseOptions));
            }
        }

This will fail on the last collection assertion. The table is returning [[WriteLine, Cached], [Clear, Modified], [Beep, Modified]]. Clear and Beep weren't modified, and so their downstream transforms shouldn't run but they do. This occurs because they're replacing the 'missing' Write which isn't tracked as being removed.

Basically I think we're not correctly tracking removals in the PredicateSyntaxStrategy.

In general, I think there should probably be an invariant that the table size going in is the same as what comes out, before compaction. In practice this isn't true today, as we implicitly track empty slots, meaning tables can (and do) get shorter. We might want to think about changing that, as I think its probably been the route cause of a lot of these removal bugs. But until we do (its not a totally trivial amount of work) lets keep the 'we only cache if the table size is the same' logic.

chsienki · 2023-04-04T00:27:24Z

src/Compilers/Core/Portable/SourceGeneration/Nodes/NodeStateTable.cs

            HasTrackedSteps = hasTrackedSteps;
+
+            static bool areStatesCompatible(ImmutableArray<TableEntry> previous, ImmutableArray<TableEntry> current)


Lets calculate this in the builder, and just pass in an 'IsCached' flag instead.

Moved this computation into the builder. I just realized you originally suggested we keep track of the flag throughout the builder, but it seems computing it once is simpler.

Stale

chsienki · 2023-04-04T00:32:50Z

@jjonescz @jaredpar I think we should probably take this almost as-is for 17.6, while recognizing that its really just patching over the underlying issue. On main we can fix the SyntaxStrategy issue, then decide even later on if we want to consider re-architecting the way we handle the empty entries or not.

jaredpar · 2023-04-04T03:15:35Z

@chsienki if you both agree let's do that. I'd like to open a new issue that details what needs to change for the more complete fix as a part of merging this. That way we can track the work.

Co-authored-by: Chris Sienkiewicz <chsienki@microsoft.com>

jjonescz · 2023-04-04T08:33:32Z

if you both agree let's do that. I'd like to open a new issue that details what needs to change for the more complete fix as a part of merging this. That way we can track the work.

I agree. Opened issues per Chris work item suggestions (one for the SyntaxNodeProvider fix we should do in main and the other for re-architecting the way we handle the empty entries):

jjonescz · 2023-04-05T06:41:21Z

@cston @dotnet/roslyn-compiler for a second review

Stale review

jjonescz added Area-Compilers Feature - Source Generators Source Generators labels Feb 22, 2023

jjonescz marked this pull request as ready for review February 22, 2023 14:55

jjonescz requested a review from a team as a code owner February 22, 2023 14:55

jjonescz changed the title ~~Check count before reusing batched items in BatchNode~~ Check count before reusing cached items in BatchNode Feb 23, 2023

chsienki self-assigned this Feb 23, 2023

cston reviewed Mar 16, 2023

View reviewed changes

src/Compilers/CSharp/Test/Semantic/SourceGeneration/StateTableTests.cs Outdated Show resolved Hide resolved

cston previously approved these changes Mar 16, 2023

View reviewed changes

jjonescz and others added 5 commits March 31, 2023 08:23

Add tests

968d97f

Check count before reusing batched items

2984b5f

Add missing assert

d1eae8e

Fix local function name casing

9b00631

Fix notation in comment

8d3b3dd

jjonescz force-pushed the 61162-SG-BatchNode-reuse branch from 3606ca2 to 8d3b3dd Compare March 31, 2023 06:23

jjonescz requested a review from a team as a code owner March 31, 2023 06:23

jjonescz changed the base branch from main to release/dev17.6 March 31, 2023 06:23

jjonescz changed the title ~~Check count before reusing cached items in BatchNode~~ [17.6] Check count before reusing cached items in BatchNode Mar 31, 2023

chsienki previously requested changes Mar 31, 2023

View reviewed changes

jjonescz added 2 commits April 3, 2023 11:29

Revert "Check count before reusing batched items"

5167663

This reverts commit 2984b5f.

Check state table counts before caching

00361f6

Clarify doc comment

551ae21

chsienki reviewed Apr 4, 2023

View reviewed changes

jjonescz and others added 2 commits April 4, 2023 09:54

Add a test

92cb5d4

Co-authored-by: Chris Sienkiewicz <chsienki@microsoft.com>

Compute IsCached flag only in the builder

dc950ad

This was referenced Apr 4, 2023

SyntaxNodeProvider should keep track of removed items #67633

Closed

Ensure source generator nodes keep track of removed entries #67634

Open

jjonescz changed the title ~~[17.6] Check count before reusing cached items in BatchNode~~ [17.6] Cache source generator node tables only if their state counts match Apr 4, 2023

chsienki approved these changes Apr 4, 2023

View reviewed changes

cston approved these changes Apr 5, 2023

View reviewed changes

jaredpar merged commit a6aeb31 into dotnet:release/dev17.6 Apr 11, 2023

jjonescz deleted the 61162-SG-BatchNode-reuse branch April 12, 2023 08:39

jcouv mentioned this pull request Apr 12, 2023

Preserve removed entries in SyntaxNodeProvider #67636

Merged

This was referenced May 2, 2023

Incremental Source Generator - Not generating empty file, when last syntax node is deleted #61162

Closed

Deleting a later node does not trigger an update？ #63422

Closed

allisonchou mentioned this pull request May 2, 2023

[Automated] PRs inserted in VS build main-33702.166 #68059

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[17.6] Cache source generator node tables only if their state counts match #66992

[17.6] Cache source generator node tables only if their state counts match #66992

jjonescz commented Feb 22, 2023 •

edited by jcouv

Loading

chsienki commented Mar 31, 2023

chsienki left a comment

jjonescz commented Apr 3, 2023 •

edited

Loading

chsienki commented Apr 4, 2023 •

edited

Loading

chsienki Apr 4, 2023

jjonescz Apr 4, 2023

chsienki commented Apr 4, 2023

jaredpar commented Apr 4, 2023

jjonescz commented Apr 4, 2023

jjonescz commented Apr 5, 2023

		HasTrackedSteps = hasTrackedSteps;

		static bool areStatesCompatible(ImmutableArray<TableEntry> previous, ImmutableArray<TableEntry> current)

[17.6] Cache source generator node tables only if their state counts match #66992

[17.6] Cache source generator node tables only if their state counts match #66992

Conversation

jjonescz commented Feb 22, 2023 • edited by jcouv Loading

chsienki commented Mar 31, 2023

chsienki left a comment

Choose a reason for hiding this comment

jjonescz commented Apr 3, 2023 • edited Loading

chsienki commented Apr 4, 2023 • edited Loading

chsienki Apr 4, 2023

Choose a reason for hiding this comment

jjonescz Apr 4, 2023

Choose a reason for hiding this comment

chsienki commented Apr 4, 2023

jaredpar commented Apr 4, 2023

jjonescz commented Apr 4, 2023

jjonescz commented Apr 5, 2023

jjonescz commented Feb 22, 2023 •

edited by jcouv

Loading

jjonescz commented Apr 3, 2023 •

edited

Loading

chsienki commented Apr 4, 2023 •

edited

Loading