-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Builtins] [Evaluation] Drop lookups #4914
[Builtins] [Evaluation] Drop lookups #4914
Conversation
Running the benchmarks manually (I did check that individual runs are consistent and not shaky at all): Results table
|
^ -8.57% on average. I was unable to fall asleep and that gave me enough time to finally realize how to remove those damn lookups. |
Wow! |
I am quite scandalized that we didn't try this before?????? |
I've tried something along these lines multiple times (see the roadmap): It's obviously the case that not doing lookups is better than doing them, however doing them allowed us to cache a lot of stuff in the array and the non-obvious part was figuring out how to get the same caching (i.e. lack of inlining and lack of annoyingly persistent join-point generation) for the runtime denotations of builtins, while retaining all the inlining within the runtime denotations themselves. Note that these issues aren't specific to our extensible builtins. If we simply wrote (pseudocode) computeCek (VBuiltin AddInteger [x, y]) = do
let (cost, z) = doCostingForAndComputeAddInteger x y
spendBudgetCek cost
returnCek z we'd still run into the problem of needing to precompute and cache the costing function inside of So it's just non-trivial to do caching alongside full inlining and I'd tried quite a few things to make that happen before this: |
1c973b2
to
154b984
Compare
718e3cf
to
b45b50d
Compare
I've documented everything without changing any code on the evaluation path, rebased on the fresh master, ran the benchmarks once again and got a 2-3x slowdown. A day later, after full-scale investigation and trying out anything and everything I could've thought of, I still have no idea what's going on. I've tried looking at the Core -- it seems fine. I've tried comparing it with the Core that we were getting three months ago when I opened this PR -- I can't observe any difference. I've tried tweaking things -- to no avail. I've run the benchmarks for the old version of this PR, for the new one and for a variety of The reason why this small and very beneficial PR has been open for so long is that I never got it to be in my sprint and changing the sprint scope is explicitly discouraged, regardless of how non-urgent the tasks in the sprint are. The lack of automated benchmarking has not been very helpful either. I'm closing this PR. I don't feel like I should pull my hair out to overcome a broken development process. Feel free to reopen and take over. |
That's too bad. |
Give me some time to recover from this little breakdown and once we have automated benchmarking, I'll start over. I've moved the Jira ticket (PLT-1066) back to the backlog. Although if somebody else wanted to take over, that would be great too. |
I've identified the problem. Two problems, actually. The first one is what I posted on slack, quoting verbatim: for the love of god, why do our benchmarks not run using the same code path as our prod? In the benchmarks the machine parameters are constructed via defaultCekParameters :: MachineParameters CekMachineCosts CekValue DefaultUni DefaultFun
defaultCekParameters = mkMachineParameters def defaultCekCostModel while in the prod the machine params are constructed via mkDynEvaluationContext :: MonadError CostModelApplyError m => BuiltinVersion DefaultFun -> Plutus.CostModelParams -> m EvaluationContext
mkDynEvaluationContext ver newCMP =
EvaluationContext <$> mkMachineParametersFor ver newCMP It's I've been banging my head against a wall and asking myself why the hell the same patch works differently depending on the date it's applied despite the Core being more or less identical (I've even used Overall, all of our previous performance achievements are pretty much irrelevant now, since we've been benchmarking a different function, not the one that is actually used in prod. |
The second one is related to
The answer is yes. If we stare at the Core of Rec {
-- RHS size: {terms: 496, types: 226, coercions: 0, joins: 0/0}
defaultCekParameters2
:: [DefaultFun]
-> DefaultFun -> BuiltinRuntime (CekValue DefaultUni DefaultFun)
defaultCekParameters2
= \ (ds_iiZU :: [DefaultFun]) (eta_B0 :: DefaultFun) ->
case ds_iiZU of {
[] ->
case eta_B0 of {
AddInteger -> lvl380_r1XIK;
<...>
}
: y_iiZW ys_iiZX ->
case y_iiZW of {
AddInteger ->
case lvl380_r1XIK of { __DEFAULT ->
defaultCekParameters2 ys_iiZX eta_B0
};
<...>
}
}
end Rec }
-- RHS size: {terms: 8, types: 10, coercions: 6, joins: 0/0}
defaultCekParameters1
:: BuiltinsRuntime DefaultFun (CekValue DefaultUni DefaultFun)
defaultCekParameters1
= case $wgo3 0# of { (# ww1_s1UI5, ww2_s1UI6 #) ->
(defaultCekParameters2 (: ww1_s1UI5 ww2_s1UI6)) `cast` <Co:6>
}
-- RHS size: {terms: 6, types: 9, coercions: 11, joins: 0/0}
defaultCekParameters
:: MachineParameters CekMachineCosts CekValue DefaultUni DefaultFun
defaultCekParameters
= case defaultCekParameters1 `cast` <Co:5> of nt_s1U3j
{ __DEFAULT ->
MachineParameters defaultCekMachineCosts (nt_s1U3j `cast` <Co:6>)
} So case y of
y' -> \x -> z into \x -> case y of
y' -> z which is how we end up with Solution? Block GHC's backstabbery by turning -- RHS size: {terms: 111, types: 2, coercions: 0, joins: 0/0}
lvl488_r1XNu
:: DefaultFun -> BuiltinRuntime (CekValue DefaultUni DefaultFun)
lvl488_r1XNu
= \ (x_i1Pr1 :: DefaultFun) ->
case x_i1Pr1 of {
AddInteger -> lvl380_r1XLD;
<...>
}
$j6_r1XNv
:: BuiltinsRuntime DefaultFun (CekValue DefaultUni DefaultFun)
$j6_r1XNv = BuiltinsRuntime lvl488_r1XNu
Rec {
-- RHS size: {terms: 332, types: 224, coercions: 0, joins: 0/0}
defaultCekParameters_go1
:: [DefaultFun]
-> BuiltinsRuntime DefaultFun (CekValue DefaultUni DefaultFun)
defaultCekParameters_go1
= \ (ds_ij0S :: [DefaultFun]) ->
case ds_ij0S of {
[] -> $j6_r1XNv;
: y_ij0W ys_ij0X ->
case y_ij0W of {
AddInteger ->
case lvl380_r1XLD of { __DEFAULT ->
defaultCekParameters_go1 ys_ij0X
};
<...>
}
}
end Rec }
-- RHS size: {terms: 8, types: 10, coercions: 0, joins: 0/0}
defaultCekParameters1
:: BuiltinsRuntime DefaultFun (CekValue DefaultUni DefaultFun)
defaultCekParameters1
= case $wgo3 0# of { (# ww1_s1DOf, ww2_s1DOg #) ->
defaultCekParameters_go1 (: ww1_s1DOf ww2_s1DOg)
}
-- RHS size: {terms: 6, types: 14, coercions: 0, joins: 0/0}
defaultCekParameters
:: MachineParameters CekMachineCosts CekValue DefaultUni DefaultFun
defaultCekParameters
= case defaultCekParameters1 of { BuiltinsRuntime dt1_i1ytI ->
MachineParameters defaultCekMachineCosts dt1_i1ytI
} Nice and tidy: This suggests that we should simply have a policy prescribing to always wrap functions in |
Great digging!
Well, it depends, sometimes the saving from having an equivalent-representation newtype might also matter. |
Were you going to make the fixes to which function we're using in the benchmarks and the |
Ok, yes, unless there's a strong reason to use |
Yes, and then benchmark. |
b45b50d
to
9c119e4
Compare
Ready for review. Got even better results, -10.35% on average: Results table
|
9c119e4
to
bd9a3dc
Compare
Wow, that's massive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
bd9a3dc
to
b0b5a74
Compare
Keep in mind that I'm running our current benchmarks, which don't necessarily reflect the performance of the production evaluator. But I'd like to get this merged already and then we can talk about fixing the benchmarks. If you're OK with this strategy, hit "merge". |
Actually, it's -10.35% when I account for |
This unscrews another portion of benchmarks. We still have benchmarks that are screwed up (see [PLT-6541](https://input-output.atlassian.net/browse/PLT-6541)) See [this](#4914 (comment)) comment for an explanation of what went wrong.
This unscrews another portion of benchmarks. We still have benchmarks that are screwed up (see [PLT-6541](https://input-output.atlassian.net/browse/PLT-6541)) See [this](IntersectMBO#4914 (comment)) comment for an explanation of what went wrong.
Don't look here yet.