Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add inference->optimize analysis forwarding mechanism #36508

Merged
merged 1 commit into from
Jul 15, 2020
Merged

Conversation

Keno
Copy link
Member

@Keno Keno commented Jul 2, 2020

This change attempts to be a solution to the generalized problem
encountered in #36169. In short, we do a whole bunch of analysis
during inference to figure out the final type of an expression,
but sometimes, we may need intermediate results that were
computed along the way. So far, we don't really have a great
place to put those results, so we end up having to re-compute
them during the optimization phase. That's what #36169 did,
but is clearly not a scalable solution.

I encountered the exact same issue while working on a new AD
compiler plugin, that needs to do a whole bunch of work during
inference to determine what to do (e.g. call a primitive, recurse,
or increase the derivative level), and optimizations need to have
access to this information.

This PR adds an additional info field to CodeInfo and IRCode
that can be used to forward this kind of information. As a proof
of concept, it forwards method match info from inference to
inlining (we do already cache these, so there's little performance
gain from this per se - it's more to exercise the infrastructure).

The plan is to do an alternative fix to #36169 on top of this
as the next step, but I figured I'd open it up for discussion first.

@Keno Keno requested review from vtjnash and JeffBezanson July 2, 2020 02:59
@Keno
Copy link
Member Author

Keno commented Jul 2, 2020

Also cc @martinholters who was complaining about the lack of such a mechanism over in #36169.

@martinholters
Copy link
Member

Should the info be a Vector (defaulting to [] instead of nothing) so that we can attach multiple different pieces of extra information?

# This is not currently called in the regular course, but may be needed
# if we ever want to re-run inlining again later in the pass pipeline after
# additional type information was discovered.
function recompute_method_matches(atype, sv)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function recompute_method_matches(atype, sv)
function recompute_method_matches(@nospecialize(atype), sv::OptimizationState)

@JeffBezanson
Copy link
Member

Saving all the method matches at each call site seems to be pretty expensive; sysimg +4%. I think we should discard those by default when the optimizer finishes.

This will need to update the deserialize method for CodeInfo. Ideally it should be able to detect older versions that don't have the new field and handle them correctly.

@Keno
Copy link
Member Author

Keno commented Jul 2, 2020

Yes, agree, I think we should discard these once inlining finished (by default, external optimizers can override) and have a special representation for vectors where all stmtinfo is norhing. I just didn't get around to implementing that yet.

end

# call where the function is known exactly
function abstract_call_known(interp::AbstractInterpreter, @nospecialize(f),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocating all these Tuple objects seems unfortunate. We've tried to work pretty hard to remove and minimize that pattern, as it seems that it can sap performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do a Pair{Any, Any} or a specialized struct instead. Seems like that may potentially be easier on the compiler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to self: Whatever we do here, should also apply to e.g. typeinf_edge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash does Pair{Any,Any} sound reasonable to you?

@Keno
Copy link
Member Author

Keno commented Jul 2, 2020

Should the info be a Vector (defaulting to [] instead of nothing) so that we can attach multiple different pieces of extra information?

I thought about it, but I think at least for the moment, there needs to be a very clear consumer for this side channel information, since it can have semantic information. If we ever get to the point where we want to store multiple independent pieces of information here, we can re-evaluate what the best format for storing it is.

@vtjnash
Copy link
Member

vtjnash commented Jul 3, 2020

I've had some feeling that the intent of "CodeInfo" corresponds to just the information we want to convey to codegen. While "IRCode" is that information plus the additional per-statement information computed by inference (with the intent that "InstructionStream" helps to manage it as a StructOfArrays view). And "InferenceState" is the additional state required to compute that.

To that perception, would it make sense to add these new columns only to IRCode, but to find a way to preserve the inference result (including extra metadata like this) using that struct as the vehicle in the "InferenceResult" part of the internal caches in the AbstractInterpreter? (possibly also renaming IRCode to "AnnotatedCodeInfo," or something like that, to better hint at that distinction)

@Keno
Copy link
Member Author

Keno commented Jul 3, 2020

I think that's generally sensible, though for the AD case for example, I do also want to be able to cache the stmtinfo without ever running the optimizer. That's still fine for the moment, since I'm currently managing my own caches there, but it would be unfortunate to have a design where stmtinfo can never appear in the internal caches.

@Keno
Copy link
Member Author

Keno commented Jul 11, 2020

Alright, updated after some discussion with @JeffBezanson and @vtjnash earlier this week. Changes:

  1. Rather than using tuples for (rt, info), there's now a dedicated CallMeta type. I verified that this codegens well and as expected without allocating the extra object.
  2. I also sunk the assignment to the info array into abstract_eval_statement, right after abstract_call. I like having the information in the return value for abstract_call, because that gets called recursively (e.g. from abstract_iteration), so it's nice to bundle up all that information, but there's really no need to also forward propagate it from abstract_eval_statement. Also, this nicely separates out information about a call from information about other kinds of statements, and gives a good place to add additional information about calls.
  3. stmtinfo got removed from CodeInfo and is not an inference/IRCode-only concept

Additionally, I added a fast-path in inlining, that short-circuits any additional work if the info is false (which is set by inference if it determines that doing the method lookup was worthless or inlining isn't warranted for other reasons). Interestingly, this change improved sysimg build time by about 5% over the merge base. I didn't look at it too closely, but I suppose it prevents some method table queries that inference had a fast-reject for but inlining was still doing. This went away after rebasing. It's now on par.

@Keno Keno force-pushed the kf/stmtinfo branch 3 times, most recently from 4a5b393 to 6ca510c Compare July 13, 2020 22:36
Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this seems better to me. I still think it may be a complicated way to specify the info-lattice (this PR simply discards the info when it's inconvenient to merge), since it can't readily handle either of the existing cases (the expression type—which calls tmerge—or the statement's edges list—which calls union), though it would be conservatively correct for adding handling of other metadata columns (like pure), so it seems okay.

splits = Any[sig.atype]
else
splits = Any[]
for union_sig in UnionSplitSignature(sig.atypes)
push!(splits, argtypes_to_type(union_sig))
end
if !isa(info, UnionSplitInfo)
infos = fill(nothing, length(splits))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong object type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean it should be Vector{Any}? Yeah, let me do that.

function recompute_method_matches(atype, sv)
# Regular case: Retrieve matching methods from cache (or compute them)
# World age does not need to be taken into account in the cache
# because it is forwarded from type inference through `sv.params`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems confusing to state. The world age is taken into account, because that's part of what sv.params means, and is partly why the cache is part of the params.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment was there before ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I should probably update it, because it's not the regular case anymore.

This change attempts to be a solution to the generalized problem
encountered in #36169. In short, we do a whole bunch of analysis
during inference to figure out the final type of an expression,
but sometimes, we may need intermediate results that were
computed along the way. So far, we don't really have a great
place to put those results, so we end up having to re-compute
them during the optimization phase. That's what #36169 did,
but is clearly not a scalable solution.

I encountered the exact same issue while working on a new AD
compiler plugin, that needs to do a whole bunch of work during
inference to determine what to do (e.g. call a primitive, recurse,
or increase the derivative level), and optimizations need to have
access to this information.

This PR adds an additional `info` field to CodeInfo and IRCode
that can be used to forward this kind of information. As a proof
of concept, it forwards method match info from inference to
inlining (we do already cache these, so there's little performance
gain from this per se - it's more to exercise the infrastructure).

The plan is to do an alternative fix to #36169 on top of this
as the next step, but I figured I'd open it up for discussion first.
@Keno Keno merged commit 6f62363 into master Jul 15, 2020
@Keno Keno deleted the kf/stmtinfo branch July 15, 2020 16:24
Keno added a commit that referenced this pull request Jul 16, 2020
In #36508 we decided after some consideration not to add the `stmtinfo`
to the `CodeInfo` object, since this info would never be used for
codegen. However, this also means that external AbstractInterpreters
that would like to cache pre-optimization results cannot simply cache
the unoptimized `CodeInfo` and then later feed it into the optimizer.
Instead they would need to cache the whole OptimizationState object,
or maybe convert to IRCode before caching. However, at the moment
we eagerly drop the `OptimizationState` wrapper as soon was we
decide not to run the optimizer. This refactors things to keep
the OptimizationState around for unoptimized methods, only dropping
it right before caching, in a way that can be overriden by
an external AbstractInterpreter.
Keno added a commit that referenced this pull request Jul 17, 2020
In #36508 we decided after some consideration not to add the `stmtinfo`
to the `CodeInfo` object, since this info would never be used for
codegen. However, this also means that external AbstractInterpreters
that would like to cache pre-optimization results cannot simply cache
the unoptimized `CodeInfo` and then later feed it into the optimizer.
Instead they would need to cache the whole OptimizationState object,
or maybe convert to IRCode before caching. However, at the moment
we eagerly drop the `OptimizationState` wrapper as soon was we
decide not to run the optimizer. This refactors things to keep
the OptimizationState around for unoptimized methods, only dropping
it right before caching, in a way that can be overriden by
an external AbstractInterpreter.

We run into the inverse problem during costant propagation where
inference would like to peek at the results of optimization in
order to decide whether constant propagation is likely to be
profitable. Of course, if optimization hasn't actually run yet
for this AbstractInterpreter, this doesn't work. Factor out
this logic such that an external interpreter can override this
heuristic. E.g. for my AD interpreter, I'm thinking just looking
at the vanilla function and checking its complexity would be
a good heuristic (since the AD'd version is supposed to give
the same result as the vanilla function, modulo capturing
some additional state for the reverse pass).
Keno added a commit that referenced this pull request Jul 20, 2020
In #36508 we decided after some consideration not to add the `stmtinfo`
to the `CodeInfo` object, since this info would never be used for
codegen. However, this also means that external AbstractInterpreters
that would like to cache pre-optimization results cannot simply cache
the unoptimized `CodeInfo` and then later feed it into the optimizer.
Instead they would need to cache the whole OptimizationState object,
or maybe convert to IRCode before caching. However, at the moment
we eagerly drop the `OptimizationState` wrapper as soon was we
decide not to run the optimizer. This refactors things to keep
the OptimizationState around for unoptimized methods, only dropping
it right before caching, in a way that can be overriden by
an external AbstractInterpreter.

We run into the inverse problem during costant propagation where
inference would like to peek at the results of optimization in
order to decide whether constant propagation is likely to be
profitable. Of course, if optimization hasn't actually run yet
for this AbstractInterpreter, this doesn't work. Factor out
this logic such that an external interpreter can override this
heuristic. E.g. for my AD interpreter, I'm thinking just looking
at the vanilla function and checking its complexity would be
a good heuristic (since the AD'd version is supposed to give
the same result as the vanilla function, modulo capturing
some additional state for the reverse pass).
Keno added a commit that referenced this pull request Jul 21, 2020
In #36508 we decided after some consideration not to add the `stmtinfo`
to the `CodeInfo` object, since this info would never be used for
codegen. However, this also means that external AbstractInterpreters
that would like to cache pre-optimization results cannot simply cache
the unoptimized `CodeInfo` and then later feed it into the optimizer.
Instead they would need to cache the whole OptimizationState object,
or maybe convert to IRCode before caching. However, at the moment
we eagerly drop the `OptimizationState` wrapper as soon was we
decide not to run the optimizer. This refactors things to keep
the OptimizationState around for unoptimized methods, only dropping
it right before caching, in a way that can be overriden by
an external AbstractInterpreter.

We run into the inverse problem during costant propagation where
inference would like to peek at the results of optimization in
order to decide whether constant propagation is likely to be
profitable. Of course, if optimization hasn't actually run yet
for this AbstractInterpreter, this doesn't work. Factor out
this logic such that an external interpreter can override this
heuristic. E.g. for my AD interpreter, I'm thinking just looking
at the vanilla function and checking its complexity would be
a good heuristic (since the AD'd version is supposed to give
the same result as the vanilla function, modulo capturing
some additional state for the reverse pass).
Keno added a commit that referenced this pull request Aug 10, 2020
In #36508 we decided after some consideration not to add the `stmtinfo`
to the `CodeInfo` object, since this info would never be used for
codegen. However, this also means that external AbstractInterpreters
that would like to cache pre-optimization results cannot simply cache
the unoptimized `CodeInfo` and then later feed it into the optimizer.
Instead they would need to cache the whole OptimizationState object,
or maybe convert to IRCode before caching. However, at the moment
we eagerly drop the `OptimizationState` wrapper as soon was we
decide not to run the optimizer. This refactors things to keep
the OptimizationState around for unoptimized methods, only dropping
it right before caching, in a way that can be overriden by
an external AbstractInterpreter.

We run into the inverse problem during costant propagation where
inference would like to peek at the results of optimization in
order to decide whether constant propagation is likely to be
profitable. Of course, if optimization hasn't actually run yet
for this AbstractInterpreter, this doesn't work. Factor out
this logic such that an external interpreter can override this
heuristic. E.g. for my AD interpreter, I'm thinking just looking
at the vanilla function and checking its complexity would be
a good heuristic (since the AD'd version is supposed to give
the same result as the vanilla function, modulo capturing
some additional state for the reverse pass).
Keno added a commit that referenced this pull request Aug 11, 2020
In #36508 we decided after some consideration not to add the `stmtinfo`
to the `CodeInfo` object, since this info would never be used for
codegen. However, this also means that external AbstractInterpreters
that would like to cache pre-optimization results cannot simply cache
the unoptimized `CodeInfo` and then later feed it into the optimizer.
Instead they would need to cache the whole OptimizationState object,
or maybe convert to IRCode before caching. However, at the moment
we eagerly drop the `OptimizationState` wrapper as soon was we
decide not to run the optimizer. This refactors things to keep
the OptimizationState around for unoptimized methods, only dropping
it right before caching, in a way that can be overriden by
an external AbstractInterpreter.

We run into the inverse problem during costant propagation where
inference would like to peek at the results of optimization in
order to decide whether constant propagation is likely to be
profitable. Of course, if optimization hasn't actually run yet
for this AbstractInterpreter, this doesn't work. Factor out
this logic such that an external interpreter can override this
heuristic. E.g. for my AD interpreter, I'm thinking just looking
at the vanilla function and checking its complexity would be
a good heuristic (since the AD'd version is supposed to give
the same result as the vanilla function, modulo capturing
some additional state for the reverse pass).
simeonschaub pushed a commit to simeonschaub/julia that referenced this pull request Aug 11, 2020
This change attempts to be a solution to the generalized problem
encountered in JuliaLang#36169. In short, we do a whole bunch of analysis
during inference to figure out the final type of an expression,
but sometimes, we may need intermediate results that were
computed along the way. So far, we don't really have a great
place to put those results, so we end up having to re-compute
them during the optimization phase. That's what JuliaLang#36169 did,
but is clearly not a scalable solution.

I encountered the exact same issue while working on a new AD
compiler plugin, that needs to do a whole bunch of work during
inference to determine what to do (e.g. call a primitive, recurse,
or increase the derivative level), and optimizations need to have
access to this information.

This PR adds an additional `info` field to CodeInfo and IRCode
that can be used to forward this kind of information. As a proof
of concept, it forwards method match info from inference to
inlining (we do already cache these, so there's little performance
gain from this per se - it's more to exercise the infrastructure).

The plan is to do an alternative fix to JuliaLang#36169 on top of this
as the next step, but I figured I'd open it up for discussion first.
simeonschaub pushed a commit to simeonschaub/julia that referenced this pull request Aug 11, 2020
In JuliaLang#36508 we decided after some consideration not to add the `stmtinfo`
to the `CodeInfo` object, since this info would never be used for
codegen. However, this also means that external AbstractInterpreters
that would like to cache pre-optimization results cannot simply cache
the unoptimized `CodeInfo` and then later feed it into the optimizer.
Instead they would need to cache the whole OptimizationState object,
or maybe convert to IRCode before caching. However, at the moment
we eagerly drop the `OptimizationState` wrapper as soon was we
decide not to run the optimizer. This refactors things to keep
the OptimizationState around for unoptimized methods, only dropping
it right before caching, in a way that can be overriden by
an external AbstractInterpreter.

We run into the inverse problem during costant propagation where
inference would like to peek at the results of optimization in
order to decide whether constant propagation is likely to be
profitable. Of course, if optimization hasn't actually run yet
for this AbstractInterpreter, this doesn't work. Factor out
this logic such that an external interpreter can override this
heuristic. E.g. for my AD interpreter, I'm thinking just looking
at the vanilla function and checking its complexity would be
a good heuristic (since the AD'd version is supposed to give
the same result as the vanilla function, modulo capturing
some additional state for the reverse pass).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants