-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Builtins] Optimize costing by being lazier #5239
Conversation
/benchmark validation |
Click here to check the status of your benchmark. |
Comparing benchmark results of ' validation' on '411cf29bfb' (base) and '7fa2872e5d' (PR) Results table
|
/benchmark nofib |
/benchmark benchmark:nofib |
Eek. |
/benchmark plutus-benchmark:nofib |
Click here to check the status of your benchmark. |
Comparing benchmark results of ' nofib' on '6f3f472138' (base) and '7fa2872e5d' (PR) Results table
|
Click here to check the status of your benchmark. |
Comparing benchmark results of ' benchmark:nofib' on '6f3f472138' (base) and '7fa2872e5d' (PR) Results table
|
Click here to check the status of your benchmark. |
Comparing benchmark results of ' plutus-benchmark:nofib' on '6f3f472138' (base) and '7fa2872e5d' (PR) Results table
|
plutus-core/plutus-core/examples/PlutusCore/Examples/Builtins.hs
Outdated
Show resolved
Hide resolved
plutus-core/plutus-core/src/PlutusCore/Evaluation/Machine/CostingFun/Core.hs
Show resolved
Hide resolved
plutus-core/plutus-core/src/PlutusCore/Evaluation/Machine/ExMemory.hs
Outdated
Show resolved
Hide resolved
CostCons (coerce $ I# i0) $ | ||
case coerce cost1 of | ||
I# i1 -> | ||
flattenCostRoseGo i1 forest1 $ \i2 -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell this is amortized O(1) for producing each element, but could take linear time for some elements (e.g., the last one). Is this optimal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'm not seeing any linear time in there, could you explain? Linear memory in the depth of the tree -- sure, but I don't think we can do better than that and it's conservative too (it's just that previously the linearity-of-memory-consumption was in the ExMemoryUsage Data
instance rather than here).
So flattenCostRoseGo
always immediately emits an element when the forest is non-empty. Then it grows the continuation (constant-time too) and recurses into the new node. If the new node is non-empty again, then we get another element in constant time, grow the continuation in constant time and recurse again. If the next node contains an empty forest, then we don't emit an element and jump to the continuation, which in constant time reduces to a flattenCostRoseGo
call (i.e. without forcing the entirety of the so-far-accumulated continuation, just by reducing the outermost lambda), which again emits a cost immediately.
Thus, it seems to me that every cost is emitted in constant time, but maybe I'm misunderstanding something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flattenCostRose (CostRose 1 [r1]) where
r1 = CostRose 2 [r2]
r2 = CostRose 3 [r3]
r3 = CostRose 4 []
In this case it will reduce to
CostCons 1 $ CostCons 2 $ CostCons 3 $ <continuation that produces CostLast 4 in 4 steps>
The continuation is essentially
\x1 -> flattenCostRoseGo x1 [] $
\x2 -> flattenCostRoseGo x2 [] $
\x3 -> flattenCostRoseGo x3 [] $
\x4 -> CostLast x4
So it seems producing 4
takes linear time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I was indeed misunderstanding, thanks a lot for the clarification and for catching. I'll think if it's a problem (releasing a linear amount of memory is a linear operation anyway, so maybe we can wait for the final cost as well, but not sure) and what to do about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems producing 4 takes linear time.
(linear in the depth of the tree, just like memory consumption)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's amortized O(1) then it's not a problem. I think the following is O(1) worst case, but I wouldn't be surprised if it's slower:
flattenCostRose :: CostRose -> CostStream
flattenCostRose (CostRose cost []) = CostLast cost
flattenCostRose (CostRose cost (r:rs)) = CostCons cost (go r rs)
go :: CostRose -> [CostRose] -> CostStream
go (CostRose cost []) [] = CostLast cost
go (CostRose cost []) (r:rs) = CostCons cost (go r rs)
go (CostRose cost (r:rs)) rs' = CostCons cost (go r (rs ++ rs'))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your code doesn't look much different to me, but perhaps I'm misunderstanding again. E.g.
go (CostRose cost1 [CostRose cost2 [CostRose cost3 []]]) rs' ~>
CostRose cost1 $ go (CostRose cost2 [CostRose cost3 []]) ([] ++ rs') ~>
CostRose cost1 $ CostRose cost2 $ go (CostRose cost3 []) ([] ++ ([] ++ rs'))
It seems like for long chain of one-element forest we're going to end up with [] ++ ([] ++ ([] ++ <...>))
, which is effectively the same, except we also pay for all the overhead of list concatenation (it's lazy, so costs are amortized, but they're still there).
If it's amortized O(1) then it's not a problem.
Yeah, and I feel like special-casing single-element forests could help us level the cost of retrieving the next element, but at the expense of introducing an additional cost for every single node, which wouldn't pay off, I think. Still, I'm going to play with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
flattenCostRoseGo :: Int# -> CostRose -> [CostRose] -> (Int# -> CostStream) -> CostStream
flattenCostRoseGo i0 rose1 forest2 k =
CostCons (coerce $ I# i0) $
let !(CostRose cost1 forest1) = rose1
!(I# i1) = coerce cost1
in case forest1 of
[] -> case forest2 of
[] -> k i1
rose2' : forest2' -> flattenCostRoseGo i1 rose2' forest2' k
rose1' : forest1' -> case forest2 of
[] -> flattenCostRoseGo i1 rose1' forest1' k
rose2' : forest2' ->
flattenCostRoseGo i1 rose1' forest1' $ \i2 ->
flattenCostRoseGo i2 rose2' forest2' k
flattenCostRose :: CostRose -> CostStream
flattenCostRose (CostRose cost []) = CostLast cost
flattenCostRose (CostRose cost (rose : forest)) =
case coerce cost of
I# i -> flattenCostRoseGo i rose forest $ \iz -> coerce CostLast $ I# iz
{-# INLINE flattenCostRose #-}
We essentially need to prove by induction that both flattenCostRoseGo
and its continuation argument always produce an element in constant time. For flattenCostRoseGo
it's entirely obvious and for the continuation we have several cases:
- the initial case with
CostLast
-- obviously constant-time - the case when either of
forest1
orforest2
is empty (but not both at the same time) -- we pass the continuation unchanged and hence don't change any of its behavior - the case when both
forest1
andforest2
are non-empty -- we do grow the continuation, but by induction hypothesisflattenCostRoseGo
produces an element in constant time and so is the new continuation
I.e. we break the previous chains of empty forest handling by matching on forest2
before recursing in order not to grow the continuation with a fancy form of id
. And we also don't seem to do any extra work overall, since that matching is useful (now that flattenCostRoseGo
receives both a node and a remaining forest), albeit slightly premature (we may run out of budget before getting to accounting for costs from forest2
, but we don't care much about optimizing the unhappy path, since it's when the smart contracts fails anyway).
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like for long chain of one-element forest we're going to end up with [] ++ ([] ++ ([] ++ <...>)), which is effectively the same
Yeah, I think you are right.
What do you think?
Yes I think this is indeed O(1) worst case. However, I don't quite see the benefit of CPS, because in my non-CPS version, you can also match on the rs
and avoid concatenation if it is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I don't quite see the benefit of CPS, because in my non-CPS version, you can also match on the
rs
and avoid concatenation if it is empty.
Yours is different. In mine we use the matching not only to check that the list is not empty, but also to retrieve its head to use it later, so in any case the matching is useful. In your case it's only useful when the list is empty, provided I got the code right:
flattenCostRoseGo :: CostRose -> [CostRose] -> CostStream
flattenCostRoseGo (CostRose cost1 forest1) forest2 =
case forest1 of
[] -> case forest2 of
[] -> CostLast cost1
rose2' : forest2' -> CostCons cost1 $ flattenCostRoseGo rose2' forest2'
rose1' : forest1' ->
CostCons cost1 $ case forest1' of
[] -> flattenCostRoseGo rose1' forest2
_ -> flattenCostRoseGo rose1' $ forest1' ++ forest2
flattenCostRose :: CostRose -> CostStream
flattenCostRose (CostRose cost []) = CostLast cost
flattenCostRose (CostRose cost (rose : forest)) = CostCons cost $ flattenCostRoseGo rose forest
{-# INLINE flattenCostRose #-}
That is one overhead, the other one is ++
. But maybe both of these overheads are negligible compared to the overhead of passing a continuation around.
I'll run the validation benchmarks with your version. If the results are the same, I'll pick it as it's much simpler than the continuation one.
Although I also wanted to try a different definition of CostRose
.
/benchmark validation |
Click here to check the status of your benchmark. |
Comparing benchmark results of ' validation' on '6f3f472138' (base) and '65794ecffa' (PR) Results table
|
plutus-core/plutus-core/src/PlutusCore/Evaluation/Machine/CostingFun/Core.hs
Show resolved
Hide resolved
plutus-core/plutus-core/src/PlutusCore/Evaluation/Machine/ExBudget.hs
Outdated
Show resolved
Hide resolved
plutus-core/plutus-core/src/PlutusCore/Evaluation/Machine/ExMemory.hs
Outdated
Show resolved
Hide resolved
plutus-core/plutus-core/src/PlutusCore/Evaluation/Machine/ExMemory.hs
Outdated
Show resolved
Hide resolved
plutus-core/untyped-plutus-core/test/Evaluation/Machines/Budget/IdNat/3.plc.golden
Show resolved
Hide resolved
/benchmark plutus-benchmark:validation |
Click here to check the status of your benchmark. |
Comparing benchmark results of ' plutus-benchmark:validation' on 'f49230f650' (base) and '80611ef09c' (PR) Results table
|
-3.51% on average! Damn, didn't expect @zliu41's suggestion to perform that well. Gonna rerun the benchmarks to make sure it's not some kind of a glitch. |
/benchmark plutus-benchmark:validation |
Click here to check the status of your benchmark. |
Comparing benchmark results of ' plutus-benchmark:validation' on 'f49230f650' (base) and '80611ef09c' (PR) Results table
|
-2.39% on average now. That's a bit more reasonable, although still a noticeable improvement. Let's try something entirely different now. |
/benchmark plutus-benchmark:validation |
Click here to check the status of your benchmark. |
Comparing benchmark results of ' plutus-benchmark:validation' on 'f49230f650' (base) and 'f403d66351' (PR) Results table
|
Eek, forgot an |
338011c
to
8d9da85
Compare
I've addressed all the comments that we can't address later (a few not-too-important ones are left as future work) and documented everything, including the Otherwise should be ready for merging. |
/benchmark validation |
Although I'm going to take a final look today just in case. |
Manual benchmark results for this branch. Looks good.
|
-3.12% on average. Looks fine indeed. Plus I believe we can squeeze out more, given that we didn't implement a bunch of optimizations discussed above. |
This makes costing lazier to improve both performance and expressiveness.
The
Costing.hs
module with tests isn't ready for review yet, everything else is ready for review.