Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test that builtin functions don't throw #4555

Merged
merged 12 commits into from
Apr 30, 2022
Merged

Test that builtin functions don't throw #4555

merged 12 commits into from
Apr 30, 2022

Conversation

zliu41
Copy link
Member

@zliu41 zliu41 commented Apr 26, 2022

This is the initial implementation.

  • The current implementation is monomorphic wrt uni and fun.
  • Currently, all type variables in polymorphic builtin functions are instantiated to Integer.
  • Three functions, VerifySignature, VerifyEcdsaSecp256k1Signature and VerifySchnorrSecp256k1Signature, are currently excluded. I need to figure out how to generate valid inputs for them.
  • Unfortunately I had to use unsafeCoerce in one place (when the argument is a list) 😛

Copy link
Contributor

@effectfully effectfully left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-done! Unfortunately, there was a miscommunication and the tests need to be tweaked a little, see one of the comments below (sorry, there are lots of them).

The current implementation is monomorphic wrt uni and fun.

That's perfectly fine, we don't really care about sets of built-in functions other than fun and we don't even have a non-DefaultUni universe.

Currently, all type variables in polymorphic builtin functions are instantiated to Integer.

Also perfectly fine.

Three functions, VerifySignature, VerifyEcdsaSecp256k1Signature and VerifySchnorrSecp256k1Signature, are currently excluded. I need to figure out how to generate valid inputs for them.

@kwxm do you happen to know some of that? Also do the latter two have tests in the Plutus Tx compiler? We'll need to generate correct as well as incorrect input to make sure exceptions don't occur in both of these cases. It's fine to leave that for a future PR, though.

Unfortunately I had to use unsafeCoerce in one place (when the argument is a list)

Seems fixable, although I wouldn't worry too much about unsafeCoerce in a test.

plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved
plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved
plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved
go sch' res as
(RuntimeSchemeResult, []) -> f
(RuntimeSchemeAll sch', _) -> go sch' f args
_ -> error $ "Wrong number of args for builtin " <> show bn <> ": " <> show argStrings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably make this function run in GenT MakeKnownM and generate arguments on the fly to avoid this error case, but it's perfectly fine this way too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not immediately obvious how to do this, so I added a TODO above this line.

plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved
plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved
plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved
plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved
plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved
Comment on lines 234 to 239
genConstr :: Int -> Gen Data
genConstr depth =
Constr <$> genInteger
<*> Gen.list
(Range.linear 0 5)
(genData (depth - 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good for the purpose of this PR, but here's a general question: @michaelpj we probably want to have some universal generator for Data providing various interesting values etc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure there is one already. Not sure where it is off the top of my head, but a quick grep should turn it up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a genBoundedData but it returns QuickCheck.Gen rather than Hedgehog.Gen, and it takes two additional parameters imem and bmem which I'd rather not having to provide.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an issue to write a better generator. There is one in the cost modelling code here but it's not ideal, and it's not totally clear to me how to write a better one: we need a generator that samples the space of Data objects "uniformly", but I'm not even sure what that means in this context. See also the comments in #4480, where I had some difficulty coming up with decent samples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, the hedgehog/quickcheck issue again :/

Copy link
Contributor

@michaelpj michaelpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe need to think about it a bit more, but here are some comments.

plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved
plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved
plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved
plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved
plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved
plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved
)
_ -> error "genArg: encountered non-Constant term"
-- Descend upon `Opaque`
| Just [_, SomeTypeRep tr'] <- matchTyCon @Opaque tr = genArg tr'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... is this actually right? @effectfully I always am unsure how the Opaque stuff works.

plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved
Comment on lines 234 to 239
genConstr :: Int -> Gen Data
genConstr depth =
Constr <$> genInteger
<*> Gen.list
(Range.linear 0 5)
(genData (depth - 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure there is one already. Not sure where it is off the top of my head, but a quick grep should turn it up.

@effectfully
Copy link
Contributor

effectfully commented Apr 27, 2022

Ok, here's my updated understanding of the task itself.

We want to generate random input for each built-in function to ensure that no built-in function ever crashes with an exception. Crucially, at runtime we execute Untyped Plutus Core, which means we don't have types there and the user is allowed to submit an ill-typed program and it will get evaluated and if it happens to succeed, it will be accepted (@michaelpj, right?).

This means that our tests need to cover the ill-typed case just like the well-typed one. We need to make sure that both the cases are present in the generated tests. Which means we need to collect the stats on ill-typed vs well-typed applications (one of the functions from here does that, I always forget which one is for what).

Currently we only generate Constants, we need to generate arbitrary terms too. We don't care about them being sophisticated or well-typed or even well-scoped, so PlutusCore.Generators.AST.genTerm should do the job. A polymorphic built-in function like ifThenElse takes arbitrary terms as its second and third arguments and it'll succeed regardless of types and whether those arguments are constants or not. A monomorphic function will fail, but we want to test failing cases too, so that's fine.

Does this make sense to everyone?

Also, I was wrong about

genArg :: forall k (a :: k). (DefaultUni `Contains` a, PrettyConst a) => TypeRep a -> Gen a

as per the discussion above, but is it possible to make it

genArg :: (a :: *). MakeKnownIn DefaultUni Term a => TypeRep a -> Gen a

and recover the current interface by lifting the resulting a into Term via makeKnown and then rendering the resulting Term with pretty to recover the String part?

If that works out, we'll still need to have a generator of completely arbitrary Terms alongside the well-typed one from the above to test the failure case.

@zliu41 sorry, I thought that would be a good first task, but it turned out to be rather bad. Too vague, touches quite a lot of pieces of the machinery and requires figuring out what actually needs to be done. You're doing great, it's just that I didn't formulate the task properly, which is why you're rewriting different bits and pieces here.

@zliu41
Copy link
Member Author

zliu41 commented Apr 27, 2022

the user is allowed to submit an ill-typed program and it will get evaluated and if it happens to succeed, it will be accepted

@zliu41 sorry, I thought that would be a good first task, but it turned out to be rather bad.

No worries, but it seems testing ill-typed programs can be a separate ticket, since this one only asks for testing that builtin functions don't throw? If so, this would still be a good first task 😃

a generator of completely arbitrary Terms alongside the well-typed one from the above to test the failure case.

That sounds good, so I think where we go from here is: (1) merge this PR after I address existing comments (2) add test cases for ill-typed programs, which will probably go like this:

  • test builtin function evaluation with arbitrary terms
  • if evaluating a builtin function throws an exception, the test succeeds. Otherwise, verify that the arguments are indeed well-typed.

What do you think?

genArg :: (a :: *). MakeKnownIn DefaultUni Term a => TypeRep a -> Gen a

I'm not sure about this one either, unless there's a way to write things like Just HRefl <- eqTypeRep (typeRepOfTyCon tr) @[] so that we can refine a to [something], which I'm not aware of.

@zliu41
Copy link
Member Author

zliu41 commented Apr 27, 2022

if evaluating a builtin function throws an exception, the test succeeds

Actually I think I may be wrong on this - evaluating a builtin function never throws an exception, even if the arguments are ill-typed, right? Are you suggesting that the tests should verify that, or the tests should verify that the evaluation fails with an error upon getting ill-typed arguments? Either way I feel like it can be done in a separate PR, but if it's the former, then it does belong to this ticket rather than a separate one.

@effectfully
Copy link
Contributor

No worries, but it seems testing ill-typed programs can be a separate ticket, since this one only asks for testing that builtin functions don't throw?

Don't throw exceptions, which is orthogonal to well-typedness.

But yes, great point, testing ill-typed programs and ensuring we generate enough test cases of each kind does sound like it deserves a separate PR.

I'm not sure about this one either, unless there's a way to write things like Just HRefl <- eqTypeRep (typeRepOfTyCon tr) @[] so that we can refine a to [something], which I'm not aware of.

You know what, I won't let you keep all the fun for yourself, I'm gonna look at this particular piece and feel free to merge the current approach once ready.

@effectfully
Copy link
Contributor

if evaluating a builtin function throws an exception, the test succeeds

Actually I think I may be wrong on this - evaluating a builtin function never throws an exception, even if the arguments are ill-typed, right?

Yes. But we had a bug once where a builtin would throw an exception. That must never happen, hence this whole endeavor.

Are you suggesting that the tests should verify that

Yes.

or the tests should verify that the evaluation fails with an error upon getting ill-typed arguments?

No. That's not true anyway, you can have an ill-typed builtin application that succeeds, we have such tests.

@zliu41
Copy link
Member Author

zliu41 commented Apr 27, 2022

Ok, I can certainly add some tests using arbitrary Terms, which should be easy given we have genTerm.

I feel like the generators for well-typed Terms are still useful (since if we test with arbitrary Terms, probably 99% of the test cases are ill-typed), so I'll keep them, but let me know if you think otherwise.

For well-typed Terms, if we want to generate non-Constants, it seems we need to write some new generators, which I think can be done in a separate PR.

@zliu41
Copy link
Member Author

zliu41 commented Apr 28, 2022

Added tests using arbitrary (not necessarily well-typed) arguments. I think I addressed all review comments so far.

@zliu41 zliu41 requested review from michaelpj and effectfully April 28, 2022 01:36
@kwxm
Copy link
Contributor

kwxm commented Apr 28, 2022

Three functions, VerifySignature, VerifyEcdsaSecp256k1Signature and VerifySchnorrSecp256k1Signature, are currently excluded. I need to figure out how to generate valid inputs for them.

@kwxm do you happen to know some of that? Also do the latter two have tests in the Plutus Tx compiler? We'll need to generate correct as well as incorrect input to make sure exceptions don't occur in both of these cases. It's fine to leave that for a future PR, though.

Each of the signature verification functions takes three bytestrings as arguments, and some of these have to be of specific sizes (you should get an error if they're the wrong size): see the comments in Crypto.hs. The functions take three arguments: a public key, a message, and a signature. They check that the signature's been obtained from the message using the private key corresponding to the public key, returning True if it has and False if it hasn't. For our purposes it shouldn't matter whether or not the signature is correct: the algorithms have to check every bit of the message in order to decide what the result is and there shouldn't be any difference in behaviour for correct and incorrect signatures. I think we can just generate random bytestrings of appropriate lengths and that should do the job (we won't need inputs where the signature really is valid, but we should check what happens when one or more of the inputs have the wrong length)

We already have some generators in the code that we use for generating the cost models for these functions: see here (although the generators need to be rationalised: there's a comment that says "FIXME: this is terrible), and here. We currently only have a benchmark for verifySignature, but Nikos will be adding the other signature functions in the near future.

I don't think there are any tests for these functions in the Plutus Tx compiler, but Koz Ross wrote some extremely comprehensive tests for the builtins in this file (which we should probably rename since it's not just about SECP256k1 any more). In fact that's probably a much better place to look for generators than the file I mentioned before. I think it even does show you how to generate correct signatures, so ignore what I said earlier about not needing to generate them: if we can, we probably should.

Copy link
Contributor

@effectfully effectfully left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit shows how to make TypeRep a -> Gen a work for constants.

I don't mind merging the PR as is, though, it looks great and it gets the job done.

@michaelpj
Copy link
Contributor

This means that our tests need to cover the ill-typed case just like the well-typed one. We need to make sure that both the cases are present in the generated tests.

Ziyang and I discussed this a bit. I think there are at least two bits of the builtin machinery that we might want to test:

  • The generic handling of terms and converting them into typed values for builtins to consume.
  • The implementation of the builtin functions themselves.

I think it's most useful for this check to focus on the second. That is, we're really trying to test that the dentotation functions are well-behaved here, which means we can assume we already have correctly-typed input.

Checking that the term -> typed constant machinery works properly seems like a separate issue, and one we could test perhaps more directly.

Copy link
Contributor

@michaelpj michaelpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Did you test that it fails if we actually have a builtin that throws?

Also I see you already added the case of providing ill-typed arguments, which is fine.

(pure . pure)
whenJust mbErr $ \(e :: SomeException) -> do
let msg =
"Builtin function evaluation failed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do this more simply with annotate: https://hackage.haskell.org/package/hedgehog-0.6.1/docs/Hedgehog.html#v:annotate

i.e. do something like:

annotate $ "Function: " <> display bn
annotate $ "Arguments: " <> ...
...

and then ditch the exception handler. I think what will happen then is that the exception will fail the test and Hedgehog will show you the annotated values in the output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to using annotate, but didn't remove the exception handler because the annotations don't show up when the test is failed by an exception.

@effectfully
Copy link
Contributor

effectfully commented Apr 28, 2022

@michaelpj

The generic handling of terms and converting them into typed values for builtins to consume.
The implementation of the builtin functions themselves.

Checking that the term -> typed constant machinery works properly seems like a separate issue, and one we could test perhaps more directly.

With deferred unlifting both are indistinguishable parts of the runtime denotation of a builtin. One can extract a constant from a term implicitly via the ReadKnownIn instance or by manually calling asConstant in the denotation of the builtin and we'll get equal Core in the end. Trying to test the two indistinguishable ways of doing the same thing separately would only increase our chances of missing something.

To put it differently, even with immediate unlifting if the denotation of a builtin takes an Opaque, then it may call asConstant on it and you can't predict if that happens or not just by looking at the types, so this possibility needs to be accounted for and non-Constant terms need to be generated, at which point there's no point in testing asConstant separately, since this is handled by the tests here anyway.

I was too sloppy with that well-typed/ill-typed distinction. We don't care about an ifThenElse application being ill-typed: as long as the first argument is a boolean constant, the application succeeds and it doesn't matter whether the branches are of the same type. So the distinction should be between succeeding and failing applications and this is what should be tracked (in a subsequent PR as we've all agreed).

Does this all make sense?

@michaelpj
Copy link
Contributor

Okay, I think I agree with that.

I was too sloppy with that well-typed/ill-typed distinction.

Well, ill-typed applications are one obvious class of failing applications!

@effectfully
Copy link
Contributor

Well, ill-typed applications are one obvious class of failing applications!

Not necessarily! An ill-typed application can run just fine, even in the typed world (i.e. the CK machine). Here's a PR with a test. So the well-typed vs ill-typed dichotomy seems completely orthogonal to what we're trying to achieve here.

@michaelpj
Copy link
Contributor

Not necessarily! An ill-typed application can run just fine, even in the typed world (i.e. the CK machine).

Ugh, this is hard to talk about. We don't check anything for arguments that are type variables, but we do for those that aren't. I was trying to talk about applications-that-fail-because-of-unlifting-of-constants-to-particular-types, I don't know what to call that 😅

@effectfully
Copy link
Contributor

Yeah, ok, I think we're on the same page.

@zliu41
Copy link
Member Author

zliu41 commented Apr 29, 2022

Did you test that it fails if we actually have a builtin that throws?

Yeah I manually verified it. Once we make the test polymorphic in fun we can add such a test case.

@zliu41
Copy link
Member Author

zliu41 commented Apr 29, 2022

@effectfully I've switched to your approach in be2efd3, and now we are generating arbitrary Terms for Opaque arguments, in the "success" cases, which happens to be what we wanted.

I think you also suggested, in a separate PR, generating well-typed Terms for Opaque, rather than arbitrary ones (in other words, for IfThenElse we should generate Terms of the same type for both branches), right? This seems much trickier since we don't have an existing generators for Terms of a specific type. So just want to make sure it is worth doing.

@effectfully
Copy link
Contributor

I think you also suggested, in a separate PR, generating well-typed Terms for Opaque, rather than arbitrary ones (in other words, for IfThenElse we should generate Terms of the same type for both branches), right?

Not quite.

So I think that for SomeConstant we need to generate:

  • constants of the right type. I.e. if it's a (a, b) there, we need to generate a tuple. And if a and/or b happen to be type variables (which is always the case with our builtins when a tuple appears inside of SomeConstant), then elements of the tuple need to be generated with random types (for example, a Bool and an Integer, a Text and a Text -- just two completely arbitrary constants, possibly but not necessarily of the same type)
  • constants of the wrong type. I.e. if it's a (a, b) there, we should occasionally generate an arbitrary constant (most of which are going to be non-tuples) to test the unhappy path
  • arbitrary terms (most of which are going to be non-constants) to test an even more unhappy path

Similarly for monomorphic builtins we want to generate constants of the right type (most often), constants of the wrong type (rarely) and non-constants (even more rarely).

And for Opaque I think we can just use the same generator that will be implemented for SomeConstant. At least I can't come up with a reason to differentiate them in any way. This does mean that we'll get very few well-typed applications of ifThenElse, but as I'm explaining to myself above we don't care about them being well-typed anyway, we only care whether evaluation succeeds or not.

Hope this is not too messy of an explanation of what I have in mind.

@zliu41
Copy link
Member Author

zliu41 commented Apr 30, 2022

constants of the right type.

We already generate constants of the right type, but right now everything is instantiated to Integer, and that part does need to be updated to allow instantiation to arbitrary types.

constants of the wrong type. I.e. if it's a (a, b) there, we should occasionally generate an arbitrary constant

I feel like this is covered by testing with arbitrary Terms, since an arbitrary Term already has a 10% chance to be a constant. Any particular reason that we need to test this case more?

And for Opaque I think we can just use the same generator that will be implemented for SomeConstant.

Same question as above - since Opaque is supposed to accept any Term, there seems to be even less reason to generate constants-only test cases for Opaque.

@effectfully
Copy link
Contributor

I feel like this is covered by testing with arbitrary Terms, since an arbitrary Term already has a 10% chance to be a constant. Any particular reason that we need to test this case more?

Good point, not sure what I was thinking about.

since Opaque is supposed to accept any Term, there seems to be even less reason to generate constants-only test cases for Opaque.

It accepts any value, but a builtin is allowed to attempt to extract a constant out of it and then it can look at the type tag of that constant etc. I.e. from the point of view of these tests there doesn't seem to be any difference between SomeConstant and Opaque.

I do now realize we don't need to consider the case of constants of the wrong type separately, it'll be enough to just generate arbitrary terms and some of them will be constants making things ill-typed. Thanks for catching.

@zliu41 zliu41 merged commit c05ab5f into master Apr 30, 2022
@zliu41 zliu41 deleted the builtin/errors branch April 30, 2022 07:27
@michaelpj
Copy link
Contributor

The comment on the test is good, but given how much discussion there has been here it might be worth expanding it into a longer Note? What do you think @zliu41 ? "Comprehensively generating arguments for builtin functions" or something

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants