Don't use the SourceTextDiffer #5624

davidwengier · 2021-10-18T04:52:07Z

Part of #5617

Curious for peoples thoughts on this one. Note: First commit is the perf fix, which is discussed below. The other two are just cleanup from the resulting state.

So from a trace of formatting a 1000+ line file, it was clear that the SourceTextDiffer was the lowest hanging fruit:

Drilling in, the indexer on Roslyn's CompositeText seems like an issue:

From looking around at Roslyn code it seems this isn't surprising. Looking at where SourceTextDiffer is used in Razor it is mainly used as a "clean up" pass so I was curious what would happen if we just didn't do that. And this is the result:

For a 1000+ line file:

Before:

Method	Job	RunStrategy	Mean	Error	StdDev	Op/s	Gen 0	Gen 1	Gen 2	Allocated
'Razor CSharp Formatting'	Job-LEMTRR	Default	2.262 s	0.0450 s	0.1026 s	0.4421	1000.0000	-	-	116 MB
'Razor CSharp Formatting'	InProcess	Throughput	2.236 s	0.0447 s	0.1130 s	0.4472	-	-	-	116 MB

After:

Method	Job	RunStrategy	Mean	Error	StdDev	Op/s	Gen 0	Gen 1	Gen 2	Allocated
'Razor CSharp Formatting'	Job-ADYYXI	Default	1.097 s	0.0219 s	0.0384 s	0.9117	1000.0000	-	-	123 MB
'Razor CSharp Formatting'	InProcess	Throughput	1.089 s	0.0216 s	0.0558 s	0.9180	1000.0000	-	-	123 MB

Still slow, but 50% improvement is pretty good, right? Well... things get a bit more interesting when running the "normal" CSharp formatting benchmark:

Before:

Method	Job	RunStrategy	Mean	Error	StdDev	Op/s	Gen 0	Gen 1	Gen 2	Allocated
'Razor CSharp Formatting'	Job-BKWBDG	Default	400.0 ms	7.95 ms	19.95 ms	2.500	1000.0000	-	-	66 MB
'Razor CSharp Formatting'	InProcess	Throughput	374.3 ms	6.04 ms	5.36 ms	2.672	1000.0000	-	-	66 MB

After:

Method	Job	RunStrategy	Mean	Error	StdDev	Op/s	Gen 0	Gen 1	Gen 2	Allocated
'Razor CSharp Formatting'	Job-QOCWPC	Default	399.7 ms	9.22 ms	27.03 ms	2.502	1000.0000	-	-	66 MB
'Razor CSharp Formatting'	InProcess	Throughput	422.6 ms	11.25 ms	33.17 ms	2.366	1000.0000	-	-	66 MB

So, do we take this change? Good question.

Anecdotally when using the product, this change doesn't really matter. Formatting is still horrendously slow. A threaded wait dialog might be an idea...

So the real question here is whether this is a win, or if we're improving things for a niche user base?

…ntext

davidwengier · 2021-10-18T05:02:07Z

...Razor/src/Microsoft.AspNetCore.Razor.LanguageServer/Formatting/CSharpOnTypeFormattingPass.cs

+        {
+            var changes = edits.Select(e => e.AsTextChange(originalText));
+            var changedText = originalText.WithChanges(changes);
+            var cleanChanges = SourceTextDiffer.GetMinimalTextChanges(originalText, changedText, lineDiffOnly: false);


Even though this line should logically be able to be removed, like the others, doing so causes 3 tests to fail. Something to look in to.

I imagine the tests that fail are just expecting a particular result (the minimal result), but now we return an un-minified version.

No the tests are formatting tests and the failures are due to a difference in indentation in the expected result.

ryanbrandenburg

The code looks fine, but we should consider applying @allisonchou's learnings as I believe this is basically the same problem we were having in Semantic Tokens.

ryanbrandenburg · 2021-10-18T16:36:58Z

src/Razor/src/Microsoft.AspNetCore.Razor.LanguageServer/Formatting/CSharpFormattingPass.cs

@@ -78,7 +78,7 @@ public async override Task<FormattingResult> ExecuteAsync(FormattingContext cont
                changedText = changedText.WithChanges(indentationChanges);
            }

-            var finalChanges = SourceTextDiffer.GetMinimalTextChanges(originalText, changedText, lineDiffOnly: false);


You should link up with @allisonchou. She's been doing a lot of perf tests on these minimal text change algorithms and we can get BIG perf wins here if we're willing to accept "mostly minimal" text changes.

I thought there might be some overlap here, but I wasn't sure. Strictly speaking this change isn't because the diff algorithm per se is bad, but it relies on indexing which doesn't perform well with long chains of CompositeText instances. A better sort algorithm that resulted in less indexing would be good, but one that used less CPU or memory for the same number of index calls would actually not improve things much I don't think. We'll see.

@allisonchou ended up deciding on a "mostly minimal" Text change algo (it's basically this one, but it partitions the Text to avoid supra-linear algo inflation). Dono the guts well enough to tell if that affects indexing, just thought I'd bring it up.

Agree with Ryan, if this call takes a while then exploring partitioning might be an option. But depending on how efficient GetTextChanges already is speed-wise, it might not be necessary.

Partitioning is better than not partitioning, but not better than calling GetTextChanges. See chart below.

Also my port of the algorithm resulted in some test failures, but I'm fairly certain it was just a bug in the handling of segment edges, and not something that would have affected perf (ie, if I fixed the bug I don't think the numbers would change)

ryanbrandenburg · 2021-10-18T16:48:59Z

...Razor/src/Microsoft.AspNetCore.Razor.LanguageServer/Formatting/CSharpOnTypeFormattingPass.cs

+        {
+            var changes = edits.Select(e => e.AsTextChange(originalText));
+            var changedText = originalText.WithChanges(changes);
+            var cleanChanges = SourceTextDiffer.GetMinimalTextChanges(originalText, changedText, lineDiffOnly: false);


I imagine the tests that fail are just expecting a particular result (the minimal result), but now we return an un-minified version.

NTaylorMullen · 2021-10-18T17:02:02Z

So the real question here is whether this is a win, or if we're improving things for a niche user base?

I definitely agree that certain niche user-bases will always have issues; however, I think the question here is if a 1k file is a niche-enough user-base or not. I'd probably say that's a reasonable file size for us to support and should do everything in our power to make it better. Now that being said, we may be making the general C# formatting slow; however, I'd doubt that larger files lacked C# at all. It's really hard to ignore a 2x perf win here 😢. I'm thinking the perf-win will more-often be a net-positive? Should we reach out to the runtime team to get insights on what they expect for the "average" Razor file?

NTaylorMullen · 2021-10-18T17:06:39Z

For the initial repro how quick are we able to get formatting to occur there?

davidwengier · 2021-10-19T01:32:23Z

For the initial repro how quick are we able to get formatting to occur there?

With this change formatting the file you sent me went from 10 seconds to 8 seconds, so this is definitely not the only change that should come from this issue.

Unless I've misunderstood your question and that answer doesn't make sense :P

NTaylorMullen · 2021-10-19T02:55:16Z

Unless I've misunderstood your question and that answer doesn't make sense :P

Nope you nailed it! We're chipping away!

allisonchou · 2021-10-20T02:19:28Z

src/Razor/src/Microsoft.AspNetCore.Razor.LanguageServer/Formatting/CSharpFormattingPass.cs

@@ -78,7 +78,7 @@ public async override Task<FormattingResult> ExecuteAsync(FormattingContext cont
                changedText = changedText.WithChanges(indentationChanges);
            }

-            var finalChanges = SourceTextDiffer.GetMinimalTextChanges(originalText, changedText, lineDiffOnly: false);


Agree with Ryan, if this call takes a while then exploring partitioning might be an option. But depending on how efficient GetTextChanges already is speed-wise, it might not be necessary.

davidwengier · 2021-10-21T03:20:21Z

This was a pain.. think i'll stick to coding.

Y Axis is milliseconds.
X axis is "number of 25 line snippets of Razor", so essentially from a 25 line file up to a 2500 line file.
Formatted in this case means no changes necessary. Unformatted means almost worst case, with every line having no indentation.

Raw data (click to expand):

Blocks	Description	Mean	Error	StdDev	Median	Op/s	Gen0	Gen1	Gen2	Allocated
1	Formatted_GetTextChanges	29.38	3.179	9.325	25.53	34.0330	-	-	-	4MB
1	Formatted_PartitionedSrcTxtDif	32.76	3.570	10.470	29.71	30.5251	-	-	-	4MB
1	Formatted_SourceTextDiffer	33.46	0.659	0.856	33.37	29.8822	-	-	-	4MB
1	Unformatted_GetTextChanges	29.57	2.722	7.984	29.02	33.8168	-	-	-	4MB
1	Unformatted_PartitionedSrcTxtDif	30.44	2.997	8.695	27.61	32.8521	-	-	-	4MB
1	Unformatted_SourceTextDiffer	29.59	2.765	8.152	27.61	33.7995	-	-	-	4MB
10	Formatted_GetTextChanges	167.95	6.506	19.184	162.06	5.9541	5000.0000	1000.0000	-	34MB
10	Formatted_PartitionedSrcTxtDif	180.07	10.885	32.096	163.25	5.5534	6000.0000	1000.0000	-	34MB
10	Formatted_SourceTextDiffer	180.76	7.551	22.028	173.95	5.5323	5000.0000	1000.0000	-	34MB
10	Unformatted_GetTextChanges	157.45	6.013	17.445	149.36	6.3510	5000.0000	1000.0000	-	33MB
10	Unformatted_PartitionedSrcTxtDif	163.85	5.448	14.542	160.39	6.1033	5000.0000	1000.0000	-	35MB
10	Unformatted_SourceTextDiffer	167.49	7.085	20.668	158.61	5.9705	5000.0000	1000.0000	-	33MB
100	Formatted_GetTextChanges	2222.97	44.236	88.344	2203.00	0.4498	80000.0000	7000.0000	-	455MB
100	Formatted_PartitionedSrcTxtDif	2622.58	61.119	172.388	2597.18	0.3813	83000.0000	7000.0000	-	477MB
100	Formatted_SourceTextDiffer	2371.57	46.858	110.451	2364.22	0.4217	80000.0000	7000.0000	-	455MB
100	Unformatted_GetTextChanges	2178.29	42.922	79.559	2169.96	0.4591	80000.0000	6000.0000	-	454MB
100	Unformatted_PartitionedSrcTxtDif	2512.65	50.233	119.384	2487.50	0.3980	83000.0000	8000.0000	-	471MB
100	Unformatted_SourceTextDiffer	2762.24	54.365	109.821	2741.39	0.3620	80000.0000	6000.0000	-	454MB
2	Formatted_GetTextChanges	48.33	4.594	13.545	40.03	20.6898	1000.0000	-	-	7MB
2	Formatted_PartitionedSrcTxtDif	55.44	4.625	13.565	56.77	18.0377	1000.0000	-	-	7MB
2	Formatted_SourceTextDiffer	47.53	4.332	12.773	42.76	21.0415	1000.0000	-	-	7MB
2	Unformatted_GetTextChanges	46.55	3.972	11.333	44.10	21.4814	1000.0000	-	-	7MB
2	Unformatted_PartitionedSrcTxtDif	54.78	5.794	17.084	45.84	18.2536	1000.0000	-	-	7MB
2	Unformatted_SourceTextDiffer	51.66	5.449	16.066	44.99	19.3566	1000.0000	-	-	7MB
20	Formatted_GetTextChanges	323.06	7.825	21.814	315.96	3.0954	12000.0000	2000.0000	-	69MB
20	Formatted_PartitionedSrcTxtDif	320.03	6.386	11.515	318.87	3.1247	12000.0000	2000.0000	-	72MB
20	Formatted_SourceTextDiffer	310.23	5.999	10.664	309.29	3.2234	12000.0000	2000.0000	-	69MB
20	Unformatted_GetTextChanges	352.89	12.925	38.109	355.51	2.8337	12000.0000	2000.0000	-	69MB
20	Unformatted_PartitionedSrcTxtDif	362.46	15.087	43.771	351.53	2.7590	12000.0000	2000.0000	-	72MB
20	Unformatted_SourceTextDiffer	373.58	14.540	42.871	363.43	2.6768	12000.0000	2000.0000	-	69MB
3	Formatted_GetTextChanges	53.28	2.906	7.806	50.37	18.7680	1000.0000	-	-	10MB
3	Formatted_PartitionedSrcTxtDif	64.43	5.883	16.974	55.59	15.5198	1000.0000	-	-	10MB
3	Formatted_SourceTextDiffer	54.43	2.452	6.546	52.54	18.3721	1000.0000	-	-	10MB
3	Unformatted_GetTextChanges	60.95	5.672	16.457	52.38	16.4079	1000.0000	-	-	10MB
3	Unformatted_PartitionedSrcTxtDif	63.23	5.194	14.735	57.06	15.8151	1000.0000	-	-	10MB
3	Unformatted_SourceTextDiffer	63.42	5.700	16.537	55.53	15.7672	1000.0000	-	-	10MB
30	Formatted_GetTextChanges	520.80	17.861	52.665	492.73	1.9201	18000.0000	2000.0000	-	108MB
30	Formatted_PartitionedSrcTxtDif	555.42	19.175	56.537	539.48	1.8004	19000.0000	2000.0000	-	113MB
30	Formatted_SourceTextDiffer	504.10	12.392	35.952	488.46	1.9837	18000.0000	2000.0000	-	108MB
30	Unformatted_GetTextChanges	496.97	14.344	41.841	485.60	2.0122	18000.0000	2000.0000	-	107MB
30	Unformatted_PartitionedSrcTxtDif	585.90	16.900	49.831	587.09	1.7068	19000.0000	2000.0000	-	112MB
30	Unformatted_SourceTextDiffer	551.15	16.514	48.691	531.08	1.8144	18000.0000	3000.0000	-	107MB
4	Formatted_GetTextChanges	72.54	6.061	16.896	65.64	13.7847	2000.0000	1000.0000	-	13MB
4	Formatted_PartitionedSrcTxtDif	75.50	4.962	13.832	69.06	13.2443	2000.0000	1000.0000	-	14MB
4	Formatted_SourceTextDiffer	69.65	3.511	9.670	65.22	14.3582	2000.0000	1000.0000	-	13MB
4	Unformatted_GetTextChanges	69.35	4.684	12.824	63.63	14.4204	2000.0000	1000.0000	-	13MB
4	Unformatted_PartitionedSrcTxtDif	72.73	2.738	7.494	70.53	13.7490	2000.0000	1000.0000	-	14MB
4	Unformatted_SourceTextDiffer	66.82	1.935	5.264	64.64	14.9647	2000.0000	1000.0000	-	13MB
40	Formatted_GetTextChanges	657.23	8.655	7.673	657.24	1.5215	26000.0000	2000.0000	-	149MB
40	Formatted_PartitionedSrcTxtDif	787.57	24.225	71.427	774.43	1.2697	27000.0000	2000.0000	-	157MB
40	Formatted_SourceTextDiffer	756.81	22.345	65.884	754.75	1.3213	26000.0000	2000.0000	-	149MB
40	Unformatted_GetTextChanges	763.11	21.827	64.357	773.48	1.3104	26000.0000	2000.0000	-	148MB
40	Unformatted_PartitionedSrcTxtDif	770.13	27.774	81.894	735.98	1.2985	27000.0000	2000.0000	-	156MB
40	Unformatted_SourceTextDiffer	796.08	20.999	61.916	783.35	1.2562	26000.0000	2000.0000	-	149MB
5	Formatted_GetTextChanges	85.65	3.646	10.404	82.05	11.6760	2000.0000	1000.0000	-	17MB
5	Formatted_PartitionedSrcTxtDif	92.26	4.808	13.718	86.80	10.8387	2000.0000	1000.0000	-	17MB
5	Formatted_SourceTextDiffer	85.68	5.045	13.897	80.06	11.6710	2000.0000	1000.0000	-	17MB
5	Unformatted_GetTextChanges	87.84	4.488	12.658	83.30	11.3845	2000.0000	1000.0000	-	17MB
5	Unformatted_PartitionedSrcTxtDif	87.87	2.693	7.094	86.01	11.3807	2000.0000	1000.0000	-	17MB
5	Unformatted_SourceTextDiffer	82.48	2.873	7.815	79.62	12.1244	2000.0000	1000.0000	-	17MB

davidwengier · 2021-10-21T05:08:50Z

Here is what I don't get... where are the results that I got in the first benchmark run, in the description? That was about 1000 lines, which is about 40 "blocks" in the second set of data, and there certainly isn't a 1 second speed up there. Very strange.

davidwengier · 2021-10-21T05:18:26Z

I've pushed up the changes to the benchmark, but not sure if it should be merged or not. Thoughts?

sharwell · 2021-10-21T15:21:55Z

Do we know what the indexer access pattern is for SourceTextDiffer? If it reads from beginning to end, we might be able to mitigate the indexer cost by extracting a range of characters (e.g. 500) at once, and then indexing into the buffer. The buffer would represent a sliding window that we only need to move when a read goes outside the window.

NTaylorMullen · 2021-10-21T17:09:40Z

This was a pain.. think i'll stick to coding.

lol, but the data magic is so dope!

Love the breakdown you did. Absolutely helps us understand the problem area more. In each of the source text differ impls are they burning cycles in the same way? IIRC reviewing Ajay's source text differ PR back in the day it allocated a good amount but those allocations were able to be mitigated using newer bcl APIs for span.

Also, does Roslyn's C# formatter not run into this issue?

Here is what I don't get... where are the results that I got in the first benchmark run, in the description? That was about 1000 lines, which is about 40 "blocks" in the second set of data, and there certainly isn't a 1 second speed up there. Very strange.

Ya that's odd. was that external variance you think?

davidwengier · 2021-10-22T02:50:54Z

Ya that's odd. was that external variance you think?

As discussed, the input data was different, so this could be a difference in % of html vs c# code (which is something we could add another knob to the benchmarks for)

Do we know what the indexer access pattern is for SourceTextDiffer? If it reads from beginning to end, we might be able to mitigate the indexer cost by extracting a range of characters (e.g. 500) at once, and then indexing into the buffer. The buffer would represent a sliding window that we only need to move when a read goes outside the window.

The code is here and I suspect you're better at reading it than me, but it appears to process both ends of the text at the same time, so sadly I don't think that will help.

# Conflicts: # src/Razor/test/Microsoft.AspNetCore.Razor.LanguageServer.Test/Formatting/CodeDirectiveFormattingTest.cs

davidwengier added 3 commits October 18, 2021 15:17

Don't use the SourceTextDiffer

7948742

Move code up

a729ab7

Don't need to validate parameters in private methods in a nullable co…

b770753

…ntext

davidwengier requested review from allisonchou, NTaylorMullen, ryanbrandenburg, sharwell and a team as code owners October 18, 2021 04:52

davidwengier commented Oct 18, 2021

View reviewed changes

ryanbrandenburg reviewed Oct 18, 2021

View reviewed changes

NTaylorMullen approved these changes Oct 19, 2021

View reviewed changes

allisonchou approved these changes Oct 20, 2021

View reviewed changes

Add some variables to the benchmark, and the formatting algo

7d31d8f

davidwengier added 3 commits October 22, 2021 14:23

Fix build

2708f77

If we're going to leave the code there, may as well test it!

9d66dfa

Merge remote-tracking branch 'upstream/main' into FormattingPerf

3fbfea9

# Conflicts: # src/Razor/test/Microsoft.AspNetCore.Razor.LanguageServer.Test/Formatting/CodeDirectiveFormattingTest.cs

ryanbrandenburg approved these changes Oct 22, 2021

View reviewed changes

davidwengier merged commit d1a7bae into dotnet:main Oct 23, 2021

davidwengier deleted the FormattingPerf branch October 23, 2021 03:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't use the SourceTextDiffer #5624

Don't use the SourceTextDiffer #5624

davidwengier commented Oct 18, 2021

davidwengier Oct 18, 2021

ryanbrandenburg Oct 18, 2021

davidwengier Oct 18, 2021

ryanbrandenburg left a comment

ryanbrandenburg Oct 18, 2021

davidwengier Oct 19, 2021

ryanbrandenburg Oct 19, 2021

allisonchou Oct 20, 2021

davidwengier Oct 21, 2021 •

edited

Loading

ryanbrandenburg Oct 18, 2021

NTaylorMullen commented Oct 18, 2021

NTaylorMullen commented Oct 18, 2021

davidwengier commented Oct 19, 2021

NTaylorMullen commented Oct 19, 2021

allisonchou Oct 20, 2021

davidwengier commented Oct 21, 2021 •

edited

Loading

davidwengier commented Oct 21, 2021

davidwengier commented Oct 21, 2021

sharwell commented Oct 21, 2021

NTaylorMullen commented Oct 21, 2021

davidwengier commented Oct 22, 2021

Don't use the SourceTextDiffer #5624

Don't use the SourceTextDiffer #5624

Conversation

davidwengier commented Oct 18, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanbrandenburg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidwengier Oct 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NTaylorMullen commented Oct 18, 2021

NTaylorMullen commented Oct 18, 2021

davidwengier commented Oct 19, 2021

NTaylorMullen commented Oct 19, 2021

Choose a reason for hiding this comment

davidwengier commented Oct 21, 2021 • edited Loading

davidwengier commented Oct 21, 2021

davidwengier commented Oct 21, 2021

sharwell commented Oct 21, 2021

NTaylorMullen commented Oct 21, 2021

davidwengier commented Oct 22, 2021

davidwengier Oct 21, 2021 •

edited

Loading

davidwengier commented Oct 21, 2021 •

edited

Loading