-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test that builtin functions don't throw #4555
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well-done! Unfortunately, there was a miscommunication and the tests need to be tweaked a little, see one of the comments below (sorry, there are lots of them).
The current implementation is monomorphic wrt
uni
andfun
.
That's perfectly fine, we don't really care about sets of built-in functions other than fun
and we don't even have a non-DefaultUni
universe.
Currently, all type variables in polymorphic builtin functions are instantiated to
Integer
.
Also perfectly fine.
Three functions,
VerifySignature
,VerifyEcdsaSecp256k1Signature
andVerifySchnorrSecp256k1Signature
, are currently excluded. I need to figure out how to generate valid inputs for them.
@kwxm do you happen to know some of that? Also do the latter two have tests in the Plutus Tx compiler? We'll need to generate correct as well as incorrect input to make sure exceptions don't occur in both of these cases. It's fine to leave that for a future PR, though.
Unfortunately I had to use
unsafeCoerce
in one place (when the argument is a list)
Seems fixable, although I wouldn't worry too much about unsafeCoerce
in a test.
go sch' res as | ||
(RuntimeSchemeResult, []) -> f | ||
(RuntimeSchemeAll sch', _) -> go sch' f args | ||
_ -> error $ "Wrong number of args for builtin " <> show bn <> ": " <> show argStrings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably make this function run in GenT MakeKnownM
and generate arguments on the fly to avoid this error
case, but it's perfectly fine this way too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not immediately obvious how to do this, so I added a TODO above this line.
genConstr :: Int -> Gen Data | ||
genConstr depth = | ||
Constr <$> genInteger | ||
<*> Gen.list | ||
(Range.linear 0 5) | ||
(genData (depth - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good for the purpose of this PR, but here's a general question: @michaelpj we probably want to have some universal generator for Data
providing various interesting values etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure there is one already. Not sure where it is off the top of my head, but a quick grep should turn it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a genBoundedData
but it returns QuickCheck.Gen
rather than Hedgehog.Gen
, and it takes two additional parameters imem
and bmem
which I'd rather not having to provide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an issue to write a better generator. There is one in the cost modelling code here but it's not ideal, and it's not totally clear to me how to write a better one: we need a generator that samples the space of Data objects "uniformly", but I'm not even sure what that means in this context. See also the comments in #4480, where I had some difficulty coming up with decent samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, the hedgehog/quickcheck issue again :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe need to think about it a bit more, but here are some comments.
) | ||
_ -> error "genArg: encountered non-Constant term" | ||
-- Descend upon `Opaque` | ||
| Just [_, SomeTypeRep tr'] <- matchTyCon @Opaque tr = genArg tr' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... is this actually right? @effectfully I always am unsure how the Opaque stuff works.
genConstr :: Int -> Gen Data | ||
genConstr depth = | ||
Constr <$> genInteger | ||
<*> Gen.list | ||
(Range.linear 0 5) | ||
(genData (depth - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure there is one already. Not sure where it is off the top of my head, but a quick grep should turn it up.
Ok, here's my updated understanding of the task itself. We want to generate random input for each built-in function to ensure that no built-in function ever crashes with an exception. Crucially, at runtime we execute Untyped Plutus Core, which means we don't have types there and the user is allowed to submit an ill-typed program and it will get evaluated and if it happens to succeed, it will be accepted (@michaelpj, right?). This means that our tests need to cover the ill-typed case just like the well-typed one. We need to make sure that both the cases are present in the generated tests. Which means we need to collect the stats on ill-typed vs well-typed applications (one of the functions from here does that, I always forget which one is for what). Currently we only generate Does this make sense to everyone? Also, I was wrong about genArg :: forall k (a :: k). (DefaultUni `Contains` a, PrettyConst a) => TypeRep a -> Gen a as per the discussion above, but is it possible to make it genArg :: (a :: *). MakeKnownIn DefaultUni Term a => TypeRep a -> Gen a and recover the current interface by lifting the resulting If that works out, we'll still need to have a generator of completely arbitrary @zliu41 sorry, I thought that would be a good first task, but it turned out to be rather bad. Too vague, touches quite a lot of pieces of the machinery and requires figuring out what actually needs to be done. You're doing great, it's just that I didn't formulate the task properly, which is why you're rewriting different bits and pieces here. |
No worries, but it seems testing ill-typed programs can be a separate ticket, since this one only asks for testing that builtin functions don't throw? If so, this would still be a good first task 😃
That sounds good, so I think where we go from here is: (1) merge this PR after I address existing comments (2) add test cases for ill-typed programs, which will probably go like this:
What do you think?
I'm not sure about this one either, unless there's a way to write things like |
Actually I think I may be wrong on this - evaluating a builtin function never throws an exception, even if the arguments are ill-typed, right? Are you suggesting that the tests should verify that, or the tests should verify that the evaluation fails with an error upon getting ill-typed arguments? Either way I feel like it can be done in a separate PR, but if it's the former, then it does belong to this ticket rather than a separate one. |
Don't throw exceptions, which is orthogonal to well-typedness. But yes, great point, testing ill-typed programs and ensuring we generate enough test cases of each kind does sound like it deserves a separate PR.
You know what, I won't let you keep all the fun for yourself, I'm gonna look at this particular piece and feel free to merge the current approach once ready. |
Yes. But we had a bug once where a builtin would throw an exception. That must never happen, hence this whole endeavor.
Yes.
No. That's not true anyway, you can have an ill-typed builtin application that succeeds, we have such tests. |
Ok, I can certainly add some tests using arbitrary Terms, which should be easy given we have I feel like the generators for well-typed Terms are still useful (since if we test with arbitrary Terms, probably 99% of the test cases are ill-typed), so I'll keep them, but let me know if you think otherwise. For well-typed Terms, if we want to generate non-Constants, it seems we need to write some new generators, which I think can be done in a separate PR. |
Added tests using arbitrary (not necessarily well-typed) arguments. I think I addressed all review comments so far. |
Each of the signature verification functions takes three bytestrings as arguments, and some of these have to be of specific sizes (you should get an error if they're the wrong size): see the comments in Crypto.hs. The functions take three arguments: a public key, a message, and a signature. They check that the signature's been obtained from the message using the private key corresponding to the public key, returning True if it has and False if it hasn't. For our purposes it shouldn't matter whether or not the signature is correct: the algorithms have to check every bit of the message in order to decide what the result is and there shouldn't be any difference in behaviour for correct and incorrect signatures. I think we can just generate random bytestrings of appropriate lengths and that should do the job (we won't need inputs where the signature really is valid, but we should check what happens when one or more of the inputs have the wrong length) We already have some generators in the code that we use for generating the cost models for these functions: see here (although the generators need to be rationalised: there's a comment that says "FIXME: this is terrible), and here. We currently only have a benchmark for I don't think there are any tests for these functions in the Plutus Tx compiler, but Koz Ross wrote some extremely comprehensive tests for the builtins in this file (which we should probably rename since it's not just about SECP256k1 any more). In fact that's probably a much better place to look for generators than the file I mentioned before. I think it even does show you how to generate correct signatures, so ignore what I said earlier about not needing to generate them: if we can, we probably should. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit shows how to make TypeRep a -> Gen a
work for constants.
I don't mind merging the PR as is, though, it looks great and it gets the job done.
Ziyang and I discussed this a bit. I think there are at least two bits of the builtin machinery that we might want to test:
I think it's most useful for this check to focus on the second. That is, we're really trying to test that the dentotation functions are well-behaved here, which means we can assume we already have correctly-typed input. Checking that the term -> typed constant machinery works properly seems like a separate issue, and one we could test perhaps more directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Did you test that it fails if we actually have a builtin that throws?
Also I see you already added the case of providing ill-typed arguments, which is fine.
(pure . pure) | ||
whenJust mbErr $ \(e :: SomeException) -> do | ||
let msg = | ||
"Builtin function evaluation failed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can do this more simply with annotate: https://hackage.haskell.org/package/hedgehog-0.6.1/docs/Hedgehog.html#v:annotate
i.e. do something like:
annotate $ "Function: " <> display bn
annotate $ "Arguments: " <> ...
...
and then ditch the exception handler. I think what will happen then is that the exception will fail the test and Hedgehog will show you the annotated values in the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed to using annotate
, but didn't remove the exception handler because the annotations don't show up when the test is failed by an exception.
With deferred unlifting both are indistinguishable parts of the runtime denotation of a builtin. One can extract a constant from a term implicitly via the To put it differently, even with immediate unlifting if the denotation of a builtin takes an I was too sloppy with that well-typed/ill-typed distinction. We don't care about an Does this all make sense? |
Okay, I think I agree with that.
Well, ill-typed applications are one obvious class of failing applications! |
Not necessarily! An ill-typed application can run just fine, even in the typed world (i.e. the CK machine). Here's a PR with a test. So the well-typed vs ill-typed dichotomy seems completely orthogonal to what we're trying to achieve here. |
Ugh, this is hard to talk about. We don't check anything for arguments that are type variables, but we do for those that aren't. I was trying to talk about applications-that-fail-because-of-unlifting-of-constants-to-particular-types, I don't know what to call that 😅 |
Yeah, ok, I think we're on the same page. |
The idea is due to @effectfully (be2efd3)
Yeah I manually verified it. Once we make the test polymorphic in |
@effectfully I've switched to your approach in be2efd3, and now we are generating arbitrary I think you also suggested, in a separate PR, generating well-typed |
Not quite. So I think that for
Similarly for monomorphic builtins we want to generate constants of the right type (most often), constants of the wrong type (rarely) and non-constants (even more rarely). And for Hope this is not too messy of an explanation of what I have in mind. |
We already generate constants of the right type, but right now everything is instantiated to
I feel like this is covered by testing with arbitrary
Same question as above - since Opaque is supposed to accept any |
Good point, not sure what I was thinking about.
It accepts any value, but a builtin is allowed to attempt to extract a constant out of it and then it can look at the type tag of that constant etc. I.e. from the point of view of these tests there doesn't seem to be any difference between I do now realize we don't need to consider the case of constants of the wrong type separately, it'll be enough to just generate arbitrary terms and some of them will be constants making things ill-typed. Thanks for catching. |
The comment on the test is good, but given how much discussion there has been here it might be worth expanding it into a longer Note? What do you think @zliu41 ? "Comprehensively generating arguments for builtin functions" or something |
This is the initial implementation.
uni
andfun
.Integer
.Three functions,VerifySignature
,VerifyEcdsaSecp256k1Signature
andVerifySchnorrSecp256k1Signature
, are currently excluded. I need to figure out how to generate valid inputs for them.Unfortunately I had to useunsafeCoerce
in one place (when the argument is a list) 😛