Handle vrot overlap and vscl/vmscl prefixes more accurately #16302
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #13990 (at least the Cloud vs Laguna case), thanks @anr2me for figuring out it was in Int_Vscl.
Also was suspecting vrot for #10650 (because it is behaving differently on Linux), and noticed IR didn't properly consider overlap. This handles it per hardware tests. I'd previously asserted vrot might indicate sine and cosine were calculated together... clearly they aren't.
It's obvious from this that, like matrix instructions, cosine is just a later micro-op (thus, it doesn't get prefixes, uses previous result, etc.) Some tests from @davidgfnet have indicated that matrix overlap also behaves this way (i.e. the way we copy the entire matrix is probably wrong.)
-[Unknown]