-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add inference->optimize analysis forwarding mechanism #36508
Conversation
Also cc @martinholters who was complaining about the lack of such a mechanism over in #36169. |
Should the |
# This is not currently called in the regular course, but may be needed | ||
# if we ever want to re-run inlining again later in the pass pipeline after | ||
# additional type information was discovered. | ||
function recompute_method_matches(atype, sv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function recompute_method_matches(atype, sv) | |
function recompute_method_matches(@nospecialize(atype), sv::OptimizationState) |
Saving all the method matches at each call site seems to be pretty expensive; sysimg +4%. I think we should discard those by default when the optimizer finishes. This will need to update the |
Yes, agree, I think we should discard these once inlining finished (by default, external optimizers can override) and have a special representation for vectors where all stmtinfo is |
end | ||
|
||
# call where the function is known exactly | ||
function abstract_call_known(interp::AbstractInterpreter, @nospecialize(f), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocating all these Tuple objects seems unfortunate. We've tried to work pretty hard to remove and minimize that pattern, as it seems that it can sap performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do a Pair{Any, Any} or a specialized struct instead. Seems like that may potentially be easier on the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to self: Whatever we do here, should also apply to e.g. typeinf_edge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtjnash does Pair{Any,Any} sound reasonable to you?
I thought about it, but I think at least for the moment, there needs to be a very clear consumer for this side channel information, since it can have semantic information. If we ever get to the point where we want to store multiple independent pieces of information here, we can re-evaluate what the best format for storing it is. |
I've had some feeling that the intent of "CodeInfo" corresponds to just the information we want to convey to codegen. While "IRCode" is that information plus the additional per-statement information computed by inference (with the intent that "InstructionStream" helps to manage it as a StructOfArrays view). And "InferenceState" is the additional state required to compute that. To that perception, would it make sense to add these new columns only to IRCode, but to find a way to preserve the inference result (including extra metadata like this) using that struct as the vehicle in the "InferenceResult" part of the internal caches in the AbstractInterpreter? (possibly also renaming IRCode to "AnnotatedCodeInfo," or something like that, to better hint at that distinction) |
I think that's generally sensible, though for the AD case for example, I do also want to be able to cache the stmtinfo without ever running the optimizer. That's still fine for the moment, since I'm currently managing my own caches there, but it would be unfortunate to have a design where stmtinfo can never appear in the internal caches. |
Alright, updated after some discussion with @JeffBezanson and @vtjnash earlier this week. Changes:
Additionally, I added a fast-path in inlining, that short-circuits any additional work if the |
4a5b393
to
6ca510c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this seems better to me. I still think it may be a complicated way to specify the info-lattice (this PR simply discards the info when it's inconvenient to merge), since it can't readily handle either of the existing cases (the expression type—which calls tmerge—or the statement's edges list—which calls union), though it would be conservatively correct for adding handling of other metadata columns (like pure), so it seems okay.
base/compiler/ssair/inlining.jl
Outdated
splits = Any[sig.atype] | ||
else | ||
splits = Any[] | ||
for union_sig in UnionSplitSignature(sig.atypes) | ||
push!(splits, argtypes_to_type(union_sig)) | ||
end | ||
if !isa(info, UnionSplitInfo) | ||
infos = fill(nothing, length(splits)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong object type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean it should be Vector{Any}? Yeah, let me do that.
function recompute_method_matches(atype, sv) | ||
# Regular case: Retrieve matching methods from cache (or compute them) | ||
# World age does not need to be taken into account in the cache | ||
# because it is forwarded from type inference through `sv.params` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems confusing to state. The world age is taken into account, because that's part of what sv.params
means, and is partly why the cache is part of the params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That comment was there before ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I should probably update it, because it's not the regular case anymore.
This change attempts to be a solution to the generalized problem encountered in #36169. In short, we do a whole bunch of analysis during inference to figure out the final type of an expression, but sometimes, we may need intermediate results that were computed along the way. So far, we don't really have a great place to put those results, so we end up having to re-compute them during the optimization phase. That's what #36169 did, but is clearly not a scalable solution. I encountered the exact same issue while working on a new AD compiler plugin, that needs to do a whole bunch of work during inference to determine what to do (e.g. call a primitive, recurse, or increase the derivative level), and optimizations need to have access to this information. This PR adds an additional `info` field to CodeInfo and IRCode that can be used to forward this kind of information. As a proof of concept, it forwards method match info from inference to inlining (we do already cache these, so there's little performance gain from this per se - it's more to exercise the infrastructure). The plan is to do an alternative fix to #36169 on top of this as the next step, but I figured I'd open it up for discussion first.
In #36508 we decided after some consideration not to add the `stmtinfo` to the `CodeInfo` object, since this info would never be used for codegen. However, this also means that external AbstractInterpreters that would like to cache pre-optimization results cannot simply cache the unoptimized `CodeInfo` and then later feed it into the optimizer. Instead they would need to cache the whole OptimizationState object, or maybe convert to IRCode before caching. However, at the moment we eagerly drop the `OptimizationState` wrapper as soon was we decide not to run the optimizer. This refactors things to keep the OptimizationState around for unoptimized methods, only dropping it right before caching, in a way that can be overriden by an external AbstractInterpreter.
In #36508 we decided after some consideration not to add the `stmtinfo` to the `CodeInfo` object, since this info would never be used for codegen. However, this also means that external AbstractInterpreters that would like to cache pre-optimization results cannot simply cache the unoptimized `CodeInfo` and then later feed it into the optimizer. Instead they would need to cache the whole OptimizationState object, or maybe convert to IRCode before caching. However, at the moment we eagerly drop the `OptimizationState` wrapper as soon was we decide not to run the optimizer. This refactors things to keep the OptimizationState around for unoptimized methods, only dropping it right before caching, in a way that can be overriden by an external AbstractInterpreter. We run into the inverse problem during costant propagation where inference would like to peek at the results of optimization in order to decide whether constant propagation is likely to be profitable. Of course, if optimization hasn't actually run yet for this AbstractInterpreter, this doesn't work. Factor out this logic such that an external interpreter can override this heuristic. E.g. for my AD interpreter, I'm thinking just looking at the vanilla function and checking its complexity would be a good heuristic (since the AD'd version is supposed to give the same result as the vanilla function, modulo capturing some additional state for the reverse pass).
In #36508 we decided after some consideration not to add the `stmtinfo` to the `CodeInfo` object, since this info would never be used for codegen. However, this also means that external AbstractInterpreters that would like to cache pre-optimization results cannot simply cache the unoptimized `CodeInfo` and then later feed it into the optimizer. Instead they would need to cache the whole OptimizationState object, or maybe convert to IRCode before caching. However, at the moment we eagerly drop the `OptimizationState` wrapper as soon was we decide not to run the optimizer. This refactors things to keep the OptimizationState around for unoptimized methods, only dropping it right before caching, in a way that can be overriden by an external AbstractInterpreter. We run into the inverse problem during costant propagation where inference would like to peek at the results of optimization in order to decide whether constant propagation is likely to be profitable. Of course, if optimization hasn't actually run yet for this AbstractInterpreter, this doesn't work. Factor out this logic such that an external interpreter can override this heuristic. E.g. for my AD interpreter, I'm thinking just looking at the vanilla function and checking its complexity would be a good heuristic (since the AD'd version is supposed to give the same result as the vanilla function, modulo capturing some additional state for the reverse pass).
In #36508 we decided after some consideration not to add the `stmtinfo` to the `CodeInfo` object, since this info would never be used for codegen. However, this also means that external AbstractInterpreters that would like to cache pre-optimization results cannot simply cache the unoptimized `CodeInfo` and then later feed it into the optimizer. Instead they would need to cache the whole OptimizationState object, or maybe convert to IRCode before caching. However, at the moment we eagerly drop the `OptimizationState` wrapper as soon was we decide not to run the optimizer. This refactors things to keep the OptimizationState around for unoptimized methods, only dropping it right before caching, in a way that can be overriden by an external AbstractInterpreter. We run into the inverse problem during costant propagation where inference would like to peek at the results of optimization in order to decide whether constant propagation is likely to be profitable. Of course, if optimization hasn't actually run yet for this AbstractInterpreter, this doesn't work. Factor out this logic such that an external interpreter can override this heuristic. E.g. for my AD interpreter, I'm thinking just looking at the vanilla function and checking its complexity would be a good heuristic (since the AD'd version is supposed to give the same result as the vanilla function, modulo capturing some additional state for the reverse pass).
In #36508 we decided after some consideration not to add the `stmtinfo` to the `CodeInfo` object, since this info would never be used for codegen. However, this also means that external AbstractInterpreters that would like to cache pre-optimization results cannot simply cache the unoptimized `CodeInfo` and then later feed it into the optimizer. Instead they would need to cache the whole OptimizationState object, or maybe convert to IRCode before caching. However, at the moment we eagerly drop the `OptimizationState` wrapper as soon was we decide not to run the optimizer. This refactors things to keep the OptimizationState around for unoptimized methods, only dropping it right before caching, in a way that can be overriden by an external AbstractInterpreter. We run into the inverse problem during costant propagation where inference would like to peek at the results of optimization in order to decide whether constant propagation is likely to be profitable. Of course, if optimization hasn't actually run yet for this AbstractInterpreter, this doesn't work. Factor out this logic such that an external interpreter can override this heuristic. E.g. for my AD interpreter, I'm thinking just looking at the vanilla function and checking its complexity would be a good heuristic (since the AD'd version is supposed to give the same result as the vanilla function, modulo capturing some additional state for the reverse pass).
In #36508 we decided after some consideration not to add the `stmtinfo` to the `CodeInfo` object, since this info would never be used for codegen. However, this also means that external AbstractInterpreters that would like to cache pre-optimization results cannot simply cache the unoptimized `CodeInfo` and then later feed it into the optimizer. Instead they would need to cache the whole OptimizationState object, or maybe convert to IRCode before caching. However, at the moment we eagerly drop the `OptimizationState` wrapper as soon was we decide not to run the optimizer. This refactors things to keep the OptimizationState around for unoptimized methods, only dropping it right before caching, in a way that can be overriden by an external AbstractInterpreter. We run into the inverse problem during costant propagation where inference would like to peek at the results of optimization in order to decide whether constant propagation is likely to be profitable. Of course, if optimization hasn't actually run yet for this AbstractInterpreter, this doesn't work. Factor out this logic such that an external interpreter can override this heuristic. E.g. for my AD interpreter, I'm thinking just looking at the vanilla function and checking its complexity would be a good heuristic (since the AD'd version is supposed to give the same result as the vanilla function, modulo capturing some additional state for the reverse pass).
This change attempts to be a solution to the generalized problem encountered in JuliaLang#36169. In short, we do a whole bunch of analysis during inference to figure out the final type of an expression, but sometimes, we may need intermediate results that were computed along the way. So far, we don't really have a great place to put those results, so we end up having to re-compute them during the optimization phase. That's what JuliaLang#36169 did, but is clearly not a scalable solution. I encountered the exact same issue while working on a new AD compiler plugin, that needs to do a whole bunch of work during inference to determine what to do (e.g. call a primitive, recurse, or increase the derivative level), and optimizations need to have access to this information. This PR adds an additional `info` field to CodeInfo and IRCode that can be used to forward this kind of information. As a proof of concept, it forwards method match info from inference to inlining (we do already cache these, so there's little performance gain from this per se - it's more to exercise the infrastructure). The plan is to do an alternative fix to JuliaLang#36169 on top of this as the next step, but I figured I'd open it up for discussion first.
In JuliaLang#36508 we decided after some consideration not to add the `stmtinfo` to the `CodeInfo` object, since this info would never be used for codegen. However, this also means that external AbstractInterpreters that would like to cache pre-optimization results cannot simply cache the unoptimized `CodeInfo` and then later feed it into the optimizer. Instead they would need to cache the whole OptimizationState object, or maybe convert to IRCode before caching. However, at the moment we eagerly drop the `OptimizationState` wrapper as soon was we decide not to run the optimizer. This refactors things to keep the OptimizationState around for unoptimized methods, only dropping it right before caching, in a way that can be overriden by an external AbstractInterpreter. We run into the inverse problem during costant propagation where inference would like to peek at the results of optimization in order to decide whether constant propagation is likely to be profitable. Of course, if optimization hasn't actually run yet for this AbstractInterpreter, this doesn't work. Factor out this logic such that an external interpreter can override this heuristic. E.g. for my AD interpreter, I'm thinking just looking at the vanilla function and checking its complexity would be a good heuristic (since the AD'd version is supposed to give the same result as the vanilla function, modulo capturing some additional state for the reverse pass).
This change attempts to be a solution to the generalized problem
encountered in #36169. In short, we do a whole bunch of analysis
during inference to figure out the final type of an expression,
but sometimes, we may need intermediate results that were
computed along the way. So far, we don't really have a great
place to put those results, so we end up having to re-compute
them during the optimization phase. That's what #36169 did,
but is clearly not a scalable solution.
I encountered the exact same issue while working on a new AD
compiler plugin, that needs to do a whole bunch of work during
inference to determine what to do (e.g. call a primitive, recurse,
or increase the derivative level), and optimizations need to have
access to this information.
This PR adds an additional
info
field to CodeInfo and IRCodethat can be used to forward this kind of information. As a proof
of concept, it forwards method match info from inference to
inlining (we do already cache these, so there's little performance
gain from this per se - it's more to exercise the infrastructure).
The plan is to do an alternative fix to #36169 on top of this
as the next step, but I figured I'd open it up for discussion first.