-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use the SourceTextDiffer #5624
Conversation
{ | ||
var changes = edits.Select(e => e.AsTextChange(originalText)); | ||
var changedText = originalText.WithChanges(changes); | ||
var cleanChanges = SourceTextDiffer.GetMinimalTextChanges(originalText, changedText, lineDiffOnly: false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this line should logically be able to be removed, like the others, doing so causes 3 tests to fail. Something to look in to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine the tests that fail are just expecting a particular result (the minimal result), but now we return an un-minified version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No the tests are formatting tests and the failures are due to a difference in indentation in the expected result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks fine, but we should consider applying @allisonchou's learnings as I believe this is basically the same problem we were having in Semantic Tokens.
@@ -78,7 +78,7 @@ public async override Task<FormattingResult> ExecuteAsync(FormattingContext cont | |||
changedText = changedText.WithChanges(indentationChanges); | |||
} | |||
|
|||
var finalChanges = SourceTextDiffer.GetMinimalTextChanges(originalText, changedText, lineDiffOnly: false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should link up with @allisonchou. She's been doing a lot of perf tests on these minimal text change algorithms and we can get BIG perf wins here if we're willing to accept "mostly minimal" text changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought there might be some overlap here, but I wasn't sure. Strictly speaking this change isn't because the diff algorithm per se is bad, but it relies on indexing which doesn't perform well with long chains of CompositeText
instances. A better sort algorithm that resulted in less indexing would be good, but one that used less CPU or memory for the same number of index calls would actually not improve things much I don't think. We'll see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@allisonchou ended up deciding on a "mostly minimal" Text change algo (it's basically this one, but it partitions the Text to avoid supra-linear algo inflation). Dono the guts well enough to tell if that affects indexing, just thought I'd bring it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Ryan, if this call takes a while then exploring partitioning might be an option. But depending on how efficient GetTextChanges
already is speed-wise, it might not be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partitioning is better than not partitioning, but not better than calling GetTextChanges
. See chart below.
Also my port of the algorithm resulted in some test failures, but I'm fairly certain it was just a bug in the handling of segment edges, and not something that would have affected perf (ie, if I fixed the bug I don't think the numbers would change)
{ | ||
var changes = edits.Select(e => e.AsTextChange(originalText)); | ||
var changedText = originalText.WithChanges(changes); | ||
var cleanChanges = SourceTextDiffer.GetMinimalTextChanges(originalText, changedText, lineDiffOnly: false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine the tests that fail are just expecting a particular result (the minimal result), but now we return an un-minified version.
I definitely agree that certain niche user-bases will always have issues; however, I think the question here is if a 1k file is a niche-enough user-base or not. I'd probably say that's a reasonable file size for us to support and should do everything in our power to make it better. Now that being said, we may be making the general C# formatting slow; however, I'd doubt that larger files lacked C# at all. It's really hard to ignore a 2x perf win here 😢. I'm thinking the perf-win will more-often be a net-positive? Should we reach out to the runtime team to get insights on what they expect for the "average" Razor file? |
For the initial repro how quick are we able to get formatting to occur there? |
With this change formatting the file you sent me went from 10 seconds to 8 seconds, so this is definitely not the only change that should come from this issue. Unless I've misunderstood your question and that answer doesn't make sense :P |
Nope you nailed it! We're chipping away! |
@@ -78,7 +78,7 @@ public async override Task<FormattingResult> ExecuteAsync(FormattingContext cont | |||
changedText = changedText.WithChanges(indentationChanges); | |||
} | |||
|
|||
var finalChanges = SourceTextDiffer.GetMinimalTextChanges(originalText, changedText, lineDiffOnly: false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Ryan, if this call takes a while then exploring partitioning might be an option. But depending on how efficient GetTextChanges
already is speed-wise, it might not be necessary.
This was a pain.. think i'll stick to coding. Y Axis is milliseconds. Raw data (click to expand):
|
Here is what I don't get... where are the results that I got in the first benchmark run, in the description? That was about 1000 lines, which is about 40 "blocks" in the second set of data, and there certainly isn't a 1 second speed up there. Very strange. |
I've pushed up the changes to the benchmark, but not sure if it should be merged or not. Thoughts? |
Do we know what the indexer access pattern is for SourceTextDiffer? If it reads from beginning to end, we might be able to mitigate the indexer cost by extracting a range of characters (e.g. 500) at once, and then indexing into the buffer. The buffer would represent a sliding window that we only need to move when a read goes outside the window. |
lol, but the data magic is so dope! Love the breakdown you did. Absolutely helps us understand the problem area more. In each of the source text differ impls are they burning cycles in the same way? IIRC reviewing Ajay's source text differ PR back in the day it allocated a good amount but those allocations were able to be mitigated using newer bcl APIs for span. Also, does Roslyn's C# formatter not run into this issue?
Ya that's odd. was that external variance you think? |
As discussed, the input data was different, so this could be a difference in % of html vs c# code (which is something we could add another knob to the benchmarks for)
The code is here and I suspect you're better at reading it than me, but it appears to process both ends of the text at the same time, so sadly I don't think that will help. |
# Conflicts: # src/Razor/test/Microsoft.AspNetCore.Razor.LanguageServer.Test/Formatting/CodeDirectiveFormattingTest.cs
Part of #5617
Curious for peoples thoughts on this one. Note: First commit is the perf fix, which is discussed below. The other two are just cleanup from the resulting state.
So from a trace of formatting a 1000+ line file, it was clear that the SourceTextDiffer was the lowest hanging fruit:
Drilling in, the indexer on Roslyn's
CompositeText
seems like an issue:From looking around at Roslyn code it seems this isn't surprising. Looking at where SourceTextDiffer is used in Razor it is mainly used as a "clean up" pass so I was curious what would happen if we just didn't do that. And this is the result:
For a 1000+ line file:
Before:
After:
Still slow, but 50% improvement is pretty good, right? Well... things get a bit more interesting when running the "normal" CSharp formatting benchmark:
Before:
After:
So, do we take this change? Good question.
Anecdotally when using the product, this change doesn't really matter. Formatting is still horrendously slow. A threaded wait dialog might be an idea...
So the real question here is whether this is a win, or if we're improving things for a niche user base?