Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ReferenceMap recalculation (almost) everywhere and switch to more fine-grained solutions #4757

Merged
merged 59 commits into from
Jun 28, 2024

Conversation

asl
Copy link
Contributor

@asl asl commented Jun 24, 2024

The goal is to get rid of reference map and switch to more fine-grained approach. Reference map is global: one recomputes it for the whole program (doing separate expensive top-down traverse). However, in many cases one only needs couple of paths being resolved from there. So, this is a huge waste of computational resources to recompute it again and again for no particular reason. This switches many places to ResolutionContext that allows to perform declaration resolution on-fly.

One particular example it fixes is #4723

@asl asl requested a review from ChrisDodd June 24, 2024 18:54
@asl asl marked this pull request as draft June 24, 2024 18:54
@fruffy fruffy added core Topics concerning the core segments of the compiler (frontend, midend, parser) breaking-change This change may break assumptions of compiler back ends. labels Jun 24, 2024
class DontcareArgs : public Transform {
ReferenceMap *refMap;
class DontcareArgs : public Transform, public ResolutionContext {
MinimalNameGenerator nameGen;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you need to apply the MinimalNameGenerator to the IR prior to using it to create new names (so it can record all the existing names). Otherwise it might create a name that is the same as an existing name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind -- just noticed that you added an init_apply to do just that,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm normally following that pattern. Unless NameGenerator is used in downstream passes.

@asl asl added run-sanitizer Use this tag to run a Clang+Sanitzers CI run. run-validation Use this tag to trigger a Validation CI run. compiler-performance Topics on improving the performance of the compiler core. labels Jun 25, 2024
@asl asl marked this pull request as ready for review June 25, 2024 06:17
@asl asl marked this pull request as draft June 25, 2024 06:18
@asl
Copy link
Contributor Author

asl commented Jun 25, 2024

@ChrisDodd I think this is enough as the first approximation. I converted majority of frontend passes. Few remaining ones are a bit more tricky and might require separate dedicated treatment and larger refactoring

@asl asl marked this pull request as ready for review June 25, 2024 17:19
@asl
Copy link
Contributor Author

asl commented Jun 25, 2024

Take home messages:

  • ResolutionContext is fine for top-level passes. However, there are lots of cases where visitors are spawned on sub-trees, but have to resolve declarations globally. Then things become much more messy as we'd need to make them ResolutionContext as well, plus pass context around. This defeats the idea of memoization as their lookup caches are effectively re-created and destroyed. We need to have a way to share / pass caches around different sub-contexts.
  • We likely need some centralized SymbolTable to handle all declaration names. Otherwise we'd need to walk over the whole IR tree just to collect names. And we do this quite often. In the testcase I do care about it takes ~50 ms just to collect names. And we do this more than 20 times, certainly. Sometimes only to create couple of temporaries.
  • TypeInference is run everywhere. Something should be done here definitely. In my testcase a single full TypeInference run is ~700 ms.

@asl asl requested a review from ChrisDodd June 25, 2024 17:25
@asl asl changed the title [DO NOT MERGE] Remove ReferenceMap recalculation everywhere and switch to more fine-grained solutions Remove ReferenceMap recalculation (almost) everywhere and switch to more fine-grained solutions Jun 25, 2024
@asl
Copy link
Contributor Author

asl commented Jun 25, 2024

Here are benchmarking results (10 iterations + warmup) for the downstream app:

Command Mean ± σ [s] Min [s] Max [s] Relative
baseline 98.474 ± 3.538 93.368 104.083 1.30 ± 0.06
this PR 75.863 ± 1.819 73.777 79.709 1.00

@fruffy fruffy requested review from grg and vlstill June 25, 2024 23:56
for (const auto *param : callee->parameters->parameters) {
const auto *argument = substitution.lookup(param);
cstring newName =
nameGen.newName(param->name.string() + "_inlined_" + callee->name.string());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick skim for now, what is the deal with these renames? Why add this renaming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to rename parameters in any way. This gives a bit more indicative base name after inlining (e.g. when caller had x and callee has x we'll end with x and x_inlined_callee instead of x and x0). This was mostly for my debugging, but I decided to leave it as-is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I request this a lot but could we split it out into a separate if it's unrelated to the ref map change? Considering it's a breaking change we should remove any confounding factors.

Copy link
Contributor Author

@asl asl Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it would still be a change. As instead of refMap we're using a name generator here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, doesn't seem like there is a way to preserve the naming here. The counters are different. Preserving the name would remove a lot of noise.

Also I recommend making this IR:ID and preserve the originalName in the generator. The previous implementation did not do this either but it is useful to have in downstream compiler passes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can return to the old "scheme" actually so the counters would match, let me do this. As for preserving originalName – I'd do this in the separate issue. See the "take home messages" above. We do recalculate name generators very often. In the testcase above each sweep of NameGenerator over the whole program costs 50 ms (!) due to visitor malloc traffic and GC. This totals to ~6 seconds of overall runtime (yes, we do recalculate that often).

Instead we should have some central SymbolTable. And all IR:ID / name generation should be routed through it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I've returned old scheme, plus did few improvements to reduce number of NameGenerator sweeps.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! We can add your naming scheme in a follow-up PR if you think it is better. I do not see the harm, any tool depending on an internal name is likely doing something wrong here.

Copy link
Contributor

@grg grg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another breaking change 😱 😉

Still reviewing, but here are a couple of initial comments.

backends/ubpf/midend.cpp Outdated Show resolved Hide resolved
auto tmpref = new IR::PathExpression(IR::ID(expression->srcInfo, tmpvar));
cmap.add(expression, tmpvar);
return tmpref;
}
};
} // namespace

MoveConstructors::MoveConstructors(ReferenceMap *refMap) {
MoveConstructors::MoveConstructors() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DoMoveConstructors could now be merged into MoveConstructors since it's only invoking one pass.

Copy link
Contributor Author

@asl asl Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I decided to keep things as-is as sometimes these "DoFoo" passes are executed standalone elsewhere. I'm fine to get rid of extra PassManager wrappers where possible if this is ok for downstream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is safe for the Tofino code. Can't speak for any other downstream projects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I explicitly decided not to touch midend passes for now as these would impact downstream projects more. But in the majority of cases the changes are just removing of refmap argument and making ctors explicit if necessary. This might uncover few subtle cases though, e.g. then refmap was stale before some downstream pass, this needs to be fixed anyway :)

Copy link
Contributor

@grg grg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chris/Fabian have covered the major items.

Here are a few small things I picked up today:

Comment on lines +19 to +20
#include "frontends/common/constantFolding.h"
#include "frontends/common/resolveReferences/referenceMap.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you use a tool to identify the missing includes? If so, what did you use? I'd potentially like to run that elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the possible tools to use is include-what-you-use: https://github.com/include-what-you-use/include-what-you-use, though it might require some settings for large codebases.

Here I am using include cleaner from clang-tidy / clangd: https://clangd.llvm.org/design/include-cleaner (https://clang.llvm.org/extra/clang-tidy/checks/misc/include-cleaner.html)

@@ -16,12 +16,13 @@ limitations under the License.

#include "tableApply.h"

#include "frontends/common/resolveReferences/referenceMap.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be here? It's already included via tableApply.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather be explicit on used includes, p4c has the tendency of allowing includes leaking through others (especially given the include order). But I can certainly remove this if it would be preferred.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a work-in-progress PR that runs IWYU across the entire codebase to clean up includes.
#3767

I am hesitant to push it further because IWYU still seems to immature as a tool. Wish clang-tidy would make its include-fixer an independent tool...

@@ -19,6 +19,7 @@ limitations under the License.
#include <boost/format.hpp>

#include "frontends/common/constantFolding.h"
#include "frontends/common/resolveReferences/referenceMap.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already coming through typeChecker.h? Same for resolveReferences.h.

@@ -16,6 +16,7 @@ limitations under the License.

#include "uniqueNames.h"

#include "frontends/common/resolveReferences/resolveReferences.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be coming through from uniqueNames.h?

Comment on lines +23 to +24
#include "absl/time/clock.h"
#include "absl/time/time.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Abseil time code does clean things up a lot 👍

midend/copyStructures.cpp Show resolved Hide resolved
@@ -16,19 +16,20 @@ limitations under the License.

#include "removeExits.h"

#include "frontends/common/resolveReferences/referenceMap.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this coming through from removeExits.h?

Copy link
Contributor

@ChrisDodd ChrisDodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely moving in the right direction

protected:
ConstantFoldingPolicy *policy;

/// Used to resolve IR nodes to declarations.
/// If `nullptr`, then `const` values cannot be resolved.
const ReferenceMap *refMap;
const DeclarationLookup *refMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to remove this member, and just use this instead as the DeclarationLookup in all cases? Where this pass is apply'd directly on a non-root node, you would need to use the two-argument apply variant that takes a Visitot::Context (eg in parsers/parserDriver.cpp)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there were cases where DoConstantFolding was executed without proper context in vicinity, this is why this hack was needed. I will double check to be sure.

Copy link
Contributor Author

@asl asl Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I double checked this and for now it seems we'd need to leave it as is. One of the main culprits is Evaluator that tries to do constant folding of arguments, therefore requesting declarations outside the current context. This is why it is currently still uses refMap.

@@ -48,7 +48,7 @@ std::unordered_multimap<cstring, const IR::IDeclaration *> &ResolutionContext::m
std::vector<const IR::IDeclaration *> ResolutionContext::resolve(const IR::ID &name,
P4::ResolutionType type) const {
const Context *ctxt = nullptr;
while (auto scope = findContext<IR::INamespace>(ctxt)) {
while (auto scope = findOrigCtxt<IR::INamespace>(ctxt)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky case -- when looking up a name in a Transform, do we want to look for the original definition of the name, or for a transformed definition of the name? If the transform is changing names it gets even harder. Do we need more variants of resolve (or some added optional argument(s)) to allow different things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. So far (as for refMap replacement) we always need original as refMap contained untransformed / uncloned node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, few places (type inference included) relied on ability being able to have lookup fail, and then do something with it.

frontends/p4/actionsInlining.cpp Show resolved Hide resolved
frontends/p4/alias.h Outdated Show resolved Hide resolved
@ChrisDodd
Copy link
Contributor

  • ResolutionContext is fine for top-level passes. However, there are lots of cases where visitors are spawned on sub-trees, but have to resolve declarations globally. Then things become much more messy as we'd need to make them ResolutionContext as well, plus pass context around. This defeats the idea of memoization as their lookup caches are effectively re-created and destroyed. We need to have a way to share / pass caches around different sub-contexts.

One major concern I have is that we never invalidate things in the ResolutionContext cache -- if this is a transform pass (or run multiple times with transforms between), it is easy for things in the cache to get out of date. We should definitely have a debug mode build where all the cache hits also do a full lookup to ensure it gets the same value back -- slow but necessary for correctness verification. If we can get the cache invalidation correct, then we could look into sharing the cache between passes. As it is currently, I'm afraid we really should be invalidating the cache every time in init_apply.

  • We likely need some centralized SymbolTable to handle all declaration names. Otherwise we'd need to walk over the whole IR tree just to collect names. And we do this quite often. In the testcase I do care about it takes ~50 ms just to collect names. And we do this more than 20 times, certainly. Sometimes only to create couple of temporaries.

Yes, could have a single MinimalNameGenerator that all passes share, rather than creating new ones all over the place.

  • TypeInference is run everywhere. Something should be done here definitely. In my testcase a single full TypeInference run is ~700 ms.

My next step plan after eliminating the refMap everywhere is to eliminate the typeMap too -- if you run TypeChecking with updateExpressions = true, it will copy the type of every Expression into the type field of the expression, and then you should not need the typeMap pretty much anywhere. So the typeMap just becomes a short-lived temp between TypeInference and ApplyTypesToExpressions.

In addition, all the readOnly = true runs of TypeInference/TypeChecking are unnecessary if the rest of the compiler is correct -- so you should not count them against the total run time. They only exist to trigger BUG_CHECKs if some other pass screws up the types.

@asl
Copy link
Contributor Author

asl commented Jun 26, 2024

One major concern I have is that we never invalidate things in the ResolutionContext cache -- if this is a transform pass (or run multiple times with transforms between), it is easy for things in the cache to get out of date

I thought about this. However, I do not think this will be a problem. IR is immutable, so if something was transformed, it will be entirely new Node. Therefore we'll have two cases:

  • Lookup using "old" (untransformed) node. Fine, we'll return old declaration
  • Lookup using "new" (transformed) node. We'll likely will not have it in cache, so we will resolve it again and cache the result.

So, the cache itself might have stale things, yes, but the answers will always be correct. Certainly, we might need invalidate it between passes, but I mostly concerned about nested cases where one pass applies couple of other visitors underneath. For those it would be great to inherit the cache. I put few FIXMEs in certain places and some of them are indeed require such caching, iirc – GetWrittenExpressions and ReadWrites are.

My next step plan after eliminating the refMap everywhere is to eliminate the typeMap too -- if you run TypeChecking with updateExpressions = true, it will copy the type of every Expression into the type field of the expression, and then you should not need the typeMap pretty much anywhere. So the typeMap just becomes a short-lived temp between TypeInference and ApplyTypesToExpressions.

Right. And this is actually already done in few places. However, things are written to require typeMap, uh-oh. Read-only type inference is actually not read-only as it still clones the whole program (and then just throw out everything). This type checking should be optional, right. E.g. in some "expensive checks" mode.

@ChrisDodd
Copy link
Contributor

One major concern I have is that we never invalidate things in the ResolutionContext cache -- if this is a transform pass (or run multiple times with transforms between), it is easy for things in the cache to get out of date

I thought about this. However, I do not think this will be a problem. IR is immutable, so if something was transformed, it will be entirely new Node. Therefore we'll have two cases:

  • Lookup using "old" (untransformed) node. Fine, we'll return old declaration
  • Lookup using "new" (transformed) node. We'll likely will not have it in cache, so we will resolve it again and cache the result.

But the lookups (keys) are by name (cstring), not by node. So if you put a node in the cache under a name and that node is later transformed, and then after that you look up the name again, you'll get the old node, not the transformed node.

However, it does look like most of the lookups (all those that call resolve or lookup) bypass the cache -- the cache is only used for getDeclsByName. In addition, in that case it also looks up the namespace in a cache first, and that is a Node *, and any change of any declaration in the namespace will result in the creation of a new INamespace *, so that lookup won't find the old cache. But we probably need to consider this if we want to clean up all this caching stuff to make it more useful.

@asl
Copy link
Contributor Author

asl commented Jun 27, 2024

But we probably need to consider this if we want to clean up all this caching stuff to make it more useful.

Yes, right, caching is essentially unwrapping a namespace, so stuff is keyed by INamespace*, this was enough to drop the complexity from quadratic to linear (especially during shadowing checks). In addition – all current code is refMap-oriented. So it expects to find old (untransformed) result, but yes, I fully agree, the lookup should probably be rethought. There are also some users that expects to do some lookups outside of the current context (e.g. analyzing a call we only have context of caller, but might want to check the callee).

asl added 21 commits June 28, 2024 09:23
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
…e here

Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
@fruffy
Copy link
Collaborator

fruffy commented Jun 28, 2024

Yes, the goal is to get rid of reference map and switch to more fine-grained approach. Reference map is global: you recompute it for the whole program (doing separate top-down traverse). However, in many cases you'd only need couple of paths being resolved from there. So, this is a huge waste of computational resources. It is not about inspectors, but transforms as well.

I think that has its uses? Imagine an interpreter that does not modify the program but uses many different visitors to traverse it. Precomputing the refMap once is all that is needed.

@asl
Copy link
Contributor Author

asl commented Jun 28, 2024

Imagine an interpreter that does not modify the program but uses many different visitors to traverse it. Precomputing the refMap once is all that is needed.

And these visitors can make us of something like refMap, true. Here we are talking about the frontend (and midend, but I have not touched it yet) that changes things over and over again. Each Transform effectively kills refMap, so it had to be recomputed from scratch. Note that common Visitors in this PR take DeclarationLookup interface (e.g. MethodInstance::resolve) that is used for resolution.

In general, if you're having some global objects that are expensive to recompute, but you'd need to recompute them after any IR change, then it smells like a big problem and design flaw IMO.

ResolutionContext certainly is not perfect:

  • It only does lookup within the current context (so you cannot e.g. analyze the call and do lookup in the context of the callee declaration)
  • In some cases we recreate it again and again. Fortunately, this is mostly in the case of small leaf lookups, so we won't waste lots of things

And see the discussion above how the internal cache of ResolutionContext could optionally be reused / shared in the nested lookup. I would say this is something that needs to be carefully designed.

As a loud thinking: if the ResolutionContext could be shared, then in your interpreter example, it could be created by top-level pass and then passed down to all visitors. They will populate it lazily effectively building refMap under hood on-fly.

Copy link
Collaborator

@fruffy fruffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And see the discussion above how the internal cache of ResolutionContext could optionally be reused / shared in the nested lookup. I would say this is something that needs to be carefully designed.

I think that would be useful, yes. Anyway, since we are not removing the RefMap anyway yet this is a moot point. Approving.


public:
// @param readOnly If true it will assert that it behaves like
// an Inspector.
TypeInference(ReferenceMap *refMap, TypeMap *typeMap, bool readOnly = false,
bool checkArrays = true);
explicit TypeInference(TypeMap *typeMap, bool readOnly = false, bool checkArrays = true,
Copy link
Collaborator

@fruffy fruffy Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking we should make these structs instead of a list of booleans. Can be easy to get them wrong even with the explicit.

I really like option structs for that: https://abseil.io/tips/173

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these are very nice. I think there were some neat wrappers somewhere to add some sugar on top of that.

@asl asl added this pull request to the merge queue Jun 28, 2024
Merged via the queue into p4lang:main with commit 274c508 Jun 28, 2024
17 checks passed
@asl asl deleted the de-refmap branch June 28, 2024 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change This change may break assumptions of compiler back ends. compiler-performance Topics on improving the performance of the compiler core. core Topics concerning the core segments of the compiler (frontend, midend, parser) run-sanitizer Use this tag to run a Clang+Sanitzers CI run. run-validation Use this tag to trigger a Validation CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants