-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVM: Adding bmodexp #6140
base: master
Are you sure you want to change the base?
AVM: Adding bmodexp #6140
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6140 +/- ##
==========================================
- Coverage 56.28% 56.26% -0.03%
==========================================
Files 494 494
Lines 69958 69999 +41
==========================================
+ Hits 39375 39384 +9
- Misses 27912 27936 +24
- Partials 2671 2679 +8 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking quite good so far.
data/transactions/logic/eval.go
Outdated
prev := last - 1 // y | ||
pprev := last - 2 // x | ||
|
||
if len(cx.Stack[last].Bytes) > maxStringSize || len(cx.Stack[prev].Bytes) > maxStringSize || len(cx.Stack[pprev].Bytes) > maxStringSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm doing this from phone, but maxStringSize
is the AVM max of 4096? That's unusual for the bmath functions, they normally have a maximum of 128 bytes. I suppose you want more because RSA uses really large keys?
At any rate, it's impossible to supply inputs that are larger than this, so there's no need to check in the opcode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Yes, the size is intended to support RSA which has really large keys. Is it okay if we allow the opcode to support this size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok with me, assuming we can get the cost function to properly account for long inputs. It seems like bmodexp
wouldn't be very interesting if it can't handle common RSA key sizes.
I'd like to support bigger inputs on the other b
opcodes too, if we can adjust cost functions appropriately. They were first done before we could have the cost depend on size. b+
, for example, would almost certainly be easy to adjust, b*
might be more complex than simple length dependence. I'm somewhat worried that bmodexp
is going to be tricky. Anyway, that should be a separate PR someday. (Do your RSA implementations require operating on RSA keys with any other operations?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it ends up being fast enough, we could pick a cost that accounts for the worst case, which I suppose would three very long inputs.
Or perhaps only scales based on one of the parameters, but accounts for the worst case with the others. (I think bmodexp
should be linear with respect to the length of the exponent, since it basically performs one operation per bit.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just read your discussion on cost from the issue more closely. The dependence on the square of the lengths is unfortunate because it implies a custom cost function. Currently, all costs are "data directed" which is nice because it means we can create specs automatically - we can generate text that describes the cost from the constants provided in the opcode table.
I suppose we can add a way to provide both a Go function and a textual description directly in the .go
source code. That is somewhat more fragile from the standpoint of modifications to the way we present the spec, but it doesn't see so bad. It's probably also necessary if we ever want to support larger inputs to b*
which I suspect is where this really coming from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it ends up being fast enough, we could pick a cost that accounts for the worst case, which I suppose would three very long inputs.
Or perhaps only scales based on one of the parameters, but accounts for the worst case with the others. (I think
bmodexp
should be linear with respect to the length of the exponent, since it basically performs one operation per bit.)
This is very tempting, and would mean minimal complexity. I anticipate bmodexp rarely being used, except for applications where the inputs are large such as RSA.
Or perhaps only scales based on one of the parameters, but accounts for the worst case with the others. (I think bmodexp should be linear with respect to the length of the exponent, since it basically performs one operation per bit.)
This is a good idea that might simplify the calculation to allow an existing linear cost model to be used.
Do your RSA implementations require operating on RSA keys with any other operations?
The only operators for RSA besides modexp are the less than and equal operators. These are efficiently implemented with a 512 bit digit partitioning algorithm available in Puya-Bignumber.
I will explore the suggestions to try to linearise the cost model, and figure out the max cost to see if it's cheap enough that we can use a constant. In my opinion, I think the complexity that would be introduced as highlighted with a non-linear cost model isn't worth it, if we can figure out a bounding linear model that's good enough. I'll present the results of the linear model to the discussion thread, and go from there.
d737fb1
to
7b6b6c2
Compare
I have addressed your feedback with a recent update, per my understanding. Let me know if there's anything that I may have misinterpreted. |
As a concrete example, smart contract verifiers for zk-proofs based on the plonk protocol generated by AlgoPlonk call a teal implementation of it 5 times, using 32-byte inputs for both curve BN254 and BLS12-381. Considering that a BN254 verifier consumes ~145,000 opcode budget and a BLS12-381 verifier ~185,000, |
Isn't the maximum 64 bytes? If we can have a plausible linear model for the smaller inputs currently supported by the b-operations (64 bytes?) which breaks for very large inputs, a solution we might consider for consistency is to offer |
You make some good points. However, in my opinion, we should offer a single opcode to reduce complexity. With a sufficiently accurate cost model, such as the one proposed in this GitHub comment, the estimated cost will closely reflect the actual cost within a small margin of error. For example, using the log-polynomial model, the cost for 64 byte long input is 105, which is intuitively reasonable and relatively small. To simplify the cost in that range, we could use a piecewise function where the cost is 105 for all inputs up to 64 bytes in length, and for longer inputs, it follows the advanced cost model. The log-polynomial model also accurately describes the cost for much larger inputs. Additionally, this seems to align with the long-term vision of allowing byte math opcodes, such as b+, to support up to 4096 bytes each, if I'm not mistaken based on the above discussion. The opcode should work for inputs up to 4096 bytes, with several test cases exceeding the 64-byte limit. As a sanity check I think it's worth adding another test case to verify the maximum supported length of 4096 bytes. |
Just to close the loop, is the suggestion that this: I would be on board with that, and I'll just have to write an "escape hatch" to allow an arbitrary cost function to be written. |
Based on this eval some log-poly formula gives better approximation. |
Looks reasonable to me, the # of iterations is |
I tried out that model after reading EIPS-2565, but I found it highly inaccurate (
where I think this is because the log transformation causes any exponents and multiplications to become coefficients and additions, respectively. The algorithm used for modexp is likely very advanced, making use of all the best approaches known. It's possible that my data isn't accurate, although I don't see how that could be. Perhaps it's worth trying to reproduce my results. I can provide my benchmark code upon request. |
I benchmarked I benchmarked using the same byte length for all three inputs, base, mode, and exp and I get:
If we divide the ns/op by
Looks like the cost stabilizes after inputs length of 128 bytes. This is the benchmark function I'm using:
|
@mangoplane could you commit the benchmarks into @giuliop how about re-running modexp_1024+ (like |
sure, I rerun using
Looks to me like we can use |
Thanks @giuliop for the insightful benchmarks. I replicated your results when the base length is at least that of the modulus. For smaller bases, the cost is overestimated, explaining the low And @algorandskiy, I have provided my benchmark code to reproduce the results of the notebook. Note that it isn't using a seed, so each run will produce slightly different results, but the overall trend should be the same. |
Looking back at benchmark the 4096 byte length case for all three parameters takes 9 sec to run, so it's not feasible on the AVM unfortunately. |
I'm about to modify
This would involve minimal changes to the codebase, if I'm not mistaken. The opcodes all assume a constant cost, a linear cost based on a single input, or a linear cost dependent on an immediate (enum variables that follow the opcode to describe its configuration). This addition implies that this trend holds, with special exceptions if a Interested in hearing everyone's thoughts on this approach. If people think it's a good idea, I am happy to proceed with its implementation. |
That's a problem! We aim for 10,000 txn/sec! The most you can execute in a single txn is 16x20,000 "ticks" (by pooling logicsigs). An "tick" is about 15ns. So that gives means a single program can run for about 360,000x15e-9 so something like 3.6*15e-4 = .0054 secs. So, is it worth getting the cost function "right" when that simply means that large operations will be (rightly) impossible, or should we drastically limit the lengths, and perhaps use a simpler cost. Either way, operations on really large keys seems impossible. All is not lost, RSA keys seem to be at most 4k BITS, not BYTES. Does that end up being 8^3 cheaper because of the squaring of the multiplicand length, and the factor of 8 from the exponent? 9/512 = 0.017 which starts to become almost feasible. |
I reproduced the results and got the log-poly model fits my data points perfectly
|
Indeed the benchmarking at 512 bytes length for all parameters gives
That means though that 512 bytes is still too large given that the max for a single program is This assumes we want to cap all parameters at the same size. If the assumption that we want to cap all parameters at the same size holds, I guess we have two options:
I would suggest to do option 1 now and then do the work to graduate all b-operations to 256 bytes parameters |
For the cost function I would use this formula to determine the actual opcode cost, i.e., the number of ticks:
where lenght is the # of bytes. This is how the benchmarking looks like, here I am keeping base and mod at the same byte length and varying the exp length. I am calculating the # of ticks by dividing the ns/op coming from
This is the function used for the benchmarking:
|
Thanks for your input and feedback everyone. I have what I believe is a solution that satisfies the tick limit, without ruling out many useful applications of the new opcode. Interested to hear your thoughts @jannotti @giuliop @algorandskiy : In practice, it's rare for the exponent in Therefore, I propose option 3:
Another feasible option is to set the limits low for |
I think option 3 can work, developers can always check at runtime if needed the sizes of their inputs and manage in the code the cases where the opcode budget would be exceeded |
That sounds like a plan. I'll get to work implementing the cost model, following my earlier suggestion involving adding a field called customCost to linearCost, optionally containing a customCost function, which should work and be reasonably clean code. |
Hey there, I've pushed my changes to handle non-linear opcode costs, following the plan I outlined earlier. The tests were extended to also verify that the cost is accurately calculated. I performed a sanity check for documentation generation and confirmed that the docs were generated without errors, containing the following information: TEAL_opcodes_v11.md sanity check: ## bmodexp
- Bytecode: 0xe6
- Stack: ..., A: []byte, B: []byte, C: []byte → ..., []byte
- A raised to the Bth power modulo C. A, B and C are interpreted as big-endian unsigned integers limited to 4096 bytes. Fail if C is zero.
- **Cost**: ((len(B) * max(len(A), len(C)) ^ 1.63) / 15) + 200
- Availability: v11 langspec_v11.json sanity check: {
"Opcode": 230,
"Name": "bmodexp",
"Args": [
"[]byte",
"[]byte",
"[]byte"
],
"Returns": [
"[]byte"
],
"Size": 1,
"DocCost": "((len(B) * max(len(A), len(C)) ^ 1.63) / 15) + 200",
"Doc": "A raised to the Bth power modulo C. A, B and C are interpreted as big-endian unsigned integers limited to 4096 bytes. Fail if C is zero.",
"IntroducedVersion": 11,
"Groups": [
"Byte Array Arithmetic"
]
} I thought it might be worthwhile to leave the |
218ce12
to
a72572d
Compare
…s, accounting for cost computation and doc generation
data/transactions/logic/opcodes.go
Outdated
expLength := float64(len(stack[prev].Bytes)) | ||
modLength := float64(len(stack[last].Bytes)) | ||
baseLength := float64(len(stack[pprev].Bytes)) | ||
cost := (math.Pow(math.Max(baseLength, modLength), 1.63) * expLength / 15) + 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't be better to declare those parameters (1.63
, 15
and 200
) as constants? I think it would make easier to "tune" them in one place, if necessary, instead of replacing them in all the consumers.
Maybe also some comments on how those constants have been chosen would be helpful for future readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, which I have addressed in the latest commit. It should be clearer now. Let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing this. Looks good!
We need to be able to tie cost functions to AVM versions, rather than consensus versions, so that existing code does not change behavior. That's why all the costs are set in |
Right! My bad, I forgot about the decoupled bytecode version requirement. Thanks for clarifying! |
Summary
Adds the new opcode
bmodexp
as described in issue #6139 to support modular exponentiation involving byte strings of up to 4096 bytes. Closes #6139Test Plan
assembler_test.go
,evalStateful_test.go
, &eval_test.go
TestBytesModExp
, covering panic cases, edge cases, acceptance cases and failure cases. Test vectors were generated manually or with Python.