-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove ReferenceMap recalculation (almost) everywhere and switch to more fine-grained solutions #4757
Conversation
class DontcareArgs : public Transform { | ||
ReferenceMap *refMap; | ||
class DontcareArgs : public Transform, public ResolutionContext { | ||
MinimalNameGenerator nameGen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that you need to apply the MinimalNameGenerator to the IR prior to using it to create new names (so it can record all the existing names). Otherwise it might create a name that is the same as an existing name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind -- just noticed that you added an init_apply to do just that,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm normally following that pattern. Unless NameGenerator
is used in downstream passes.
@ChrisDodd I think this is enough as the first approximation. I converted majority of frontend passes. Few remaining ones are a bit more tricky and might require separate dedicated treatment and larger refactoring |
Take home messages:
|
Here are benchmarking results (10 iterations + warmup) for the downstream app:
|
frontends/p4/actionsInlining.cpp
Outdated
for (const auto *param : callee->parameters->parameters) { | ||
const auto *argument = substitution.lookup(param); | ||
cstring newName = | ||
nameGen.newName(param->name.string() + "_inlined_" + callee->name.string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick skim for now, what is the deal with these renames? Why add this renaming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to rename parameters in any way. This gives a bit more indicative base name after inlining (e.g. when caller had x
and callee has x
we'll end with x
and x_inlined_callee
instead of x
and x0
). This was mostly for my debugging, but I decided to leave it as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I request this a lot but could we split it out into a separate if it's unrelated to the ref map change? Considering it's a breaking change we should remove any confounding factors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it would still be a change. As instead of refMap
we're using a name generator here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, doesn't seem like there is a way to preserve the naming here. The counters are different. Preserving the name would remove a lot of noise.
Also I recommend making this IR:ID
and preserve the originalName in the generator. The previous implementation did not do this either but it is useful to have in downstream compiler passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can return to the old "scheme" actually so the counters would match, let me do this. As for preserving originalName
– I'd do this in the separate issue. See the "take home messages" above. We do recalculate name generators very often. In the testcase above each sweep of NameGenerator over the whole program costs 50 ms (!) due to visitor malloc traffic and GC. This totals to ~6 seconds of overall runtime (yes, we do recalculate that often).
Instead we should have some central SymbolTable
. And all IR:ID
/ name generation should be routed through it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I've returned old scheme, plus did few improvements to reduce number of NameGenerator
sweeps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! We can add your naming scheme in a follow-up PR if you think it is better. I do not see the harm, any tool depending on an internal name is likely doing something wrong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another breaking change 😱 😉
Still reviewing, but here are a couple of initial comments.
frontends/p4/moveConstructors.cpp
Outdated
auto tmpref = new IR::PathExpression(IR::ID(expression->srcInfo, tmpvar)); | ||
cmap.add(expression, tmpvar); | ||
return tmpref; | ||
} | ||
}; | ||
} // namespace | ||
|
||
MoveConstructors::MoveConstructors(ReferenceMap *refMap) { | ||
MoveConstructors::MoveConstructors() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DoMoveConstructors could now be merged into MoveConstructors since it's only invoking one pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I decided to keep things as-is as sometimes these "DoFoo" passes are executed standalone elsewhere. I'm fine to get rid of extra PassManager wrappers where possible if this is ok for downstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is safe for the Tofino code. Can't speak for any other downstream projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I explicitly decided not to touch midend passes for now as these would impact downstream projects more. But in the majority of cases the changes are just removing of refmap
argument and making ctors explicit if necessary. This might uncover few subtle cases though, e.g. then refmap was stale before some downstream pass, this needs to be fixed anyway :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chris/Fabian have covered the major items.
Here are a few small things I picked up today:
#include "frontends/common/constantFolding.h" | ||
#include "frontends/common/resolveReferences/referenceMap.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you use a tool to identify the missing includes? If so, what did you use? I'd potentially like to run that elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the possible tools to use is include-what-you-use: https://github.com/include-what-you-use/include-what-you-use, though it might require some settings for large codebases.
Here I am using include cleaner from clang-tidy / clangd: https://clangd.llvm.org/design/include-cleaner (https://clang.llvm.org/extra/clang-tidy/checks/misc/include-cleaner.html)
@@ -16,12 +16,13 @@ limitations under the License. | |||
|
|||
#include "tableApply.h" | |||
|
|||
#include "frontends/common/resolveReferences/referenceMap.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be here? It's already included via tableApply.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather be explicit on used includes, p4c has the tendency of allowing includes leaking through others (especially given the include order). But I can certainly remove this if it would be preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a work-in-progress PR that runs IWYU across the entire codebase to clean up includes.
#3767
I am hesitant to push it further because IWYU still seems to immature as a tool. Wish clang-tidy would make its include-fixer an independent tool...
@@ -19,6 +19,7 @@ limitations under the License. | |||
#include <boost/format.hpp> | |||
|
|||
#include "frontends/common/constantFolding.h" | |||
#include "frontends/common/resolveReferences/referenceMap.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this already coming through typeChecker.h? Same for resolveReferences.h.
@@ -16,6 +16,7 @@ limitations under the License. | |||
|
|||
#include "uniqueNames.h" | |||
|
|||
#include "frontends/common/resolveReferences/resolveReferences.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be coming through from uniqueNames.h?
#include "absl/time/clock.h" | ||
#include "absl/time/time.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Abseil time code does clean things up a lot 👍
@@ -16,19 +16,20 @@ limitations under the License. | |||
|
|||
#include "removeExits.h" | |||
|
|||
#include "frontends/common/resolveReferences/referenceMap.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this coming through from removeExits.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely moving in the right direction
protected: | ||
ConstantFoldingPolicy *policy; | ||
|
||
/// Used to resolve IR nodes to declarations. | ||
/// If `nullptr`, then `const` values cannot be resolved. | ||
const ReferenceMap *refMap; | ||
const DeclarationLookup *refMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to remove this member, and just use this
instead as the DeclarationLookup in all cases? Where this pass is apply
'd directly on a non-root node, you would need to use the two-argument apply
variant that takes a Visitot::Context
(eg in parsers/parserDriver.cpp)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there were cases where DoConstantFolding
was executed without proper context in vicinity, this is why this hack was needed. I will double check to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I double checked this and for now it seems we'd need to leave it as is. One of the main culprits is Evaluator
that tries to do constant folding of arguments, therefore requesting declarations outside the current context. This is why it is currently still uses refMap.
@@ -48,7 +48,7 @@ std::unordered_multimap<cstring, const IR::IDeclaration *> &ResolutionContext::m | |||
std::vector<const IR::IDeclaration *> ResolutionContext::resolve(const IR::ID &name, | |||
P4::ResolutionType type) const { | |||
const Context *ctxt = nullptr; | |||
while (auto scope = findContext<IR::INamespace>(ctxt)) { | |||
while (auto scope = findOrigCtxt<IR::INamespace>(ctxt)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tricky case -- when looking up a name in a Transform, do we want to look for the original definition of the name, or for a transformed definition of the name? If the transform is changing names it gets even harder. Do we need more variants of resolve
(or some added optional argument(s)) to allow different things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. So far (as for refMap replacement) we always need original as refMap contained untransformed / uncloned node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, few places (type inference included) relied on ability being able to have lookup fail, and then do something with it.
One major concern I have is that we never invalidate things in the ResolutionContext cache -- if this is a transform pass (or run multiple times with transforms between), it is easy for things in the cache to get out of date. We should definitely have a debug mode build where all the cache hits also do a full lookup to ensure it gets the same value back -- slow but necessary for correctness verification. If we can get the cache invalidation correct, then we could look into sharing the cache between passes. As it is currently, I'm afraid we really should be invalidating the cache every time in init_apply.
Yes, could have a single MinimalNameGenerator that all passes share, rather than creating new ones all over the place.
My next step plan after eliminating the refMap everywhere is to eliminate the typeMap too -- if you run TypeChecking with updateExpressions = true, it will copy the type of every Expression into the In addition, all the |
I thought about this. However, I do not think this will be a problem. IR is immutable, so if something was transformed, it will be entirely new Node. Therefore we'll have two cases:
So, the cache itself might have stale things, yes, but the answers will always be correct. Certainly, we might need invalidate it between passes, but I mostly concerned about nested cases where one pass applies couple of other visitors underneath. For those it would be great to inherit the cache. I put few FIXMEs in certain places and some of them are indeed require such caching, iirc –
Right. And this is actually already done in few places. However, things are written to require typeMap, uh-oh. Read-only type inference is actually not read-only as it still clones the whole program (and then just throw out everything). This type checking should be optional, right. E.g. in some "expensive checks" mode. |
But the lookups (keys) are by name (cstring), not by node. So if you put a node in the cache under a name and that node is later transformed, and then after that you look up the name again, you'll get the old node, not the transformed node. However, it does look like most of the lookups (all those that call |
Yes, right, caching is essentially unwrapping a namespace, so stuff is keyed by |
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
…e here Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
I think that has its uses? Imagine an interpreter that does not modify the program but uses many different visitors to traverse it. Precomputing the refMap once is all that is needed. |
And these visitors can make us of something like refMap, true. Here we are talking about the frontend (and midend, but I have not touched it yet) that changes things over and over again. Each Transform effectively kills refMap, so it had to be recomputed from scratch. Note that common Visitors in this PR take In general, if you're having some global objects that are expensive to recompute, but you'd need to recompute them after any IR change, then it smells like a big problem and design flaw IMO. ResolutionContext certainly is not perfect:
And see the discussion above how the internal cache of As a loud thinking: if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And see the discussion above how the internal cache of ResolutionContext could optionally be reused / shared in the nested lookup. I would say this is something that needs to be carefully designed.
I think that would be useful, yes. Anyway, since we are not removing the RefMap anyway yet this is a moot point. Approving.
|
||
public: | ||
// @param readOnly If true it will assert that it behaves like | ||
// an Inspector. | ||
TypeInference(ReferenceMap *refMap, TypeMap *typeMap, bool readOnly = false, | ||
bool checkArrays = true); | ||
explicit TypeInference(TypeMap *typeMap, bool readOnly = false, bool checkArrays = true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking we should make these structs instead of a list of booleans. Can be easy to get them wrong even with the explicit
.
I really like option structs for that: https://abseil.io/tips/173
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these are very nice. I think there were some neat wrappers somewhere to add some sugar on top of that.
The goal is to get rid of reference map and switch to more fine-grained approach. Reference map is global: one recomputes it for the whole program (doing separate expensive top-down traverse). However, in many cases one only needs couple of paths being resolved from there. So, this is a huge waste of computational resources to recompute it again and again for no particular reason. This switches many places to
ResolutionContext
that allows to perform declaration resolution on-fly.One particular example it fixes is #4723