Test that builtin functions don't throw #4555

zliu41 · 2022-04-26T23:12:51Z

This is the initial implementation.

The current implementation is monomorphic wrt uni and fun.
Currently, all type variables in polymorphic builtin functions are instantiated to Integer.
~~Three functions, VerifySignature, VerifyEcdsaSecp256k1Signature and VerifySchnorrSecp256k1Signature, are currently excluded. I need to figure out how to generate valid inputs for them.~~
~~Unfortunately I had to use unsafeCoerce in one place (when the argument is a list) 😛~~

effectfully

Well-done! Unfortunately, there was a miscommunication and the tests need to be tweaked a little, see one of the comments below (sorry, there are lots of them).

The current implementation is monomorphic wrt uni and fun.

That's perfectly fine, we don't really care about sets of built-in functions other than fun and we don't even have a non-DefaultUni universe.

Currently, all type variables in polymorphic builtin functions are instantiated to Integer.

Also perfectly fine.

Three functions, VerifySignature, VerifyEcdsaSecp256k1Signature and VerifySchnorrSecp256k1Signature, are currently excluded. I need to figure out how to generate valid inputs for them.

@kwxm do you happen to know some of that? Also do the latter two have tests in the Plutus Tx compiler? We'll need to generate correct as well as incorrect input to make sure exceptions don't occur in both of these cases. It's fine to leave that for a future PR, though.

Unfortunately I had to use unsafeCoerce in one place (when the argument is a list)

Seems fixable, although I wouldn't worry too much about unsafeCoerce in a test.

plutus-core/plutus-core/test/Evaluation/Spec.hs

effectfully · 2022-04-27T00:02:23Z

plutus-core/plutus-core/test/Evaluation/Spec.hs

+            go sch' res as
+          (RuntimeSchemeResult, []) -> f
+          (RuntimeSchemeAll sch', _) -> go sch' f args
+          _ -> error $ "Wrong number of args for builtin " <> show bn <> ": " <> show argStrings


You could probably make this function run in GenT MakeKnownM and generate arguments on the fly to avoid this error case, but it's perfectly fine this way too.

Not immediately obvious how to do this, so I added a TODO above this line.

plutus-core/plutus-core/test/Evaluation/Spec.hs

effectfully · 2022-04-27T00:26:24Z

plutus-core/plutus-core/test/Evaluation/Spec.hs

+genConstr :: Int -> Gen Data
+genConstr depth =
+  Constr <$> genInteger
+    <*> Gen.list
+      (Range.linear 0 5)
+      (genData (depth - 1))


This is good for the purpose of this PR, but here's a general question: @michaelpj we probably want to have some universal generator for Data providing various interesting values etc?

I'm pretty sure there is one already. Not sure where it is off the top of my head, but a quick grep should turn it up.

There's a genBoundedData but it returns QuickCheck.Gen rather than Hedgehog.Gen, and it takes two additional parameters imem and bmem which I'd rather not having to provide.

There's an issue to write a better generator. There is one in the cost modelling code here but it's not ideal, and it's not totally clear to me how to write a better one: we need a generator that samples the space of Data objects "uniformly", but I'm not even sure what that means in this context. See also the comments in #4480, where I had some difficulty coming up with decent samples.

Argh, the hedgehog/quickcheck issue again :/

michaelpj

Maybe need to think about it a bit more, but here are some comments.

plutus-core/plutus-core/test/Evaluation/Spec.hs

michaelpj · 2022-04-27T16:45:39Z

plutus-core/plutus-core/test/Evaluation/Spec.hs

+            )
+        _ -> error "genArg: encountered non-Constant term"
+  -- Descend upon `Opaque`
+  | Just [_, SomeTypeRep tr'] <- matchTyCon @Opaque tr = genArg tr'


... is this actually right? @effectfully I always am unsure how the Opaque stuff works.

plutus-core/plutus-core/test/Evaluation/Spec.hs

michaelpj · 2022-04-27T16:47:41Z

plutus-core/plutus-core/test/Evaluation/Spec.hs

+genConstr :: Int -> Gen Data
+genConstr depth =
+  Constr <$> genInteger
+    <*> Gen.list
+      (Range.linear 0 5)
+      (genData (depth - 1))


I'm pretty sure there is one already. Not sure where it is off the top of my head, but a quick grep should turn it up.

plutus-core/plutus-core/test/Evaluation/Spec.hs

effectfully · 2022-04-27T22:12:50Z

Ok, here's my updated understanding of the task itself.

We want to generate random input for each built-in function to ensure that no built-in function ever crashes with an exception. Crucially, at runtime we execute Untyped Plutus Core, which means we don't have types there and the user is allowed to submit an ill-typed program and it will get evaluated and if it happens to succeed, it will be accepted (@michaelpj, right?).

This means that our tests need to cover the ill-typed case just like the well-typed one. We need to make sure that both the cases are present in the generated tests. Which means we need to collect the stats on ill-typed vs well-typed applications (one of the functions from here does that, I always forget which one is for what).

Currently we only generate Constants, we need to generate arbitrary terms too. We don't care about them being sophisticated or well-typed or even well-scoped, so PlutusCore.Generators.AST.genTerm should do the job. A polymorphic built-in function like ifThenElse takes arbitrary terms as its second and third arguments and it'll succeed regardless of types and whether those arguments are constants or not. A monomorphic function will fail, but we want to test failing cases too, so that's fine.

Does this make sense to everyone?

Also, I was wrong about

genArg :: forall k (a :: k). (DefaultUni `Contains` a, PrettyConst a) => TypeRep a -> Gen a

as per the discussion above, but is it possible to make it

genArg :: (a :: *). MakeKnownIn DefaultUni Term a => TypeRep a -> Gen a

and recover the current interface by lifting the resulting a into Term via makeKnown and then rendering the resulting Term with pretty to recover the String part?

If that works out, we'll still need to have a generator of completely arbitrary Terms alongside the well-typed one from the above to test the failure case.

@zliu41 sorry, I thought that would be a good first task, but it turned out to be rather bad. Too vague, touches quite a lot of pieces of the machinery and requires figuring out what actually needs to be done. You're doing great, it's just that I didn't formulate the task properly, which is why you're rewriting different bits and pieces here.

zliu41 · 2022-04-27T22:39:37Z

the user is allowed to submit an ill-typed program and it will get evaluated and if it happens to succeed, it will be accepted

@zliu41 sorry, I thought that would be a good first task, but it turned out to be rather bad.

No worries, but it seems testing ill-typed programs can be a separate ticket, since this one only asks for testing that builtin functions don't throw? If so, this would still be a good first task 😃

a generator of completely arbitrary Terms alongside the well-typed one from the above to test the failure case.

That sounds good, so I think where we go from here is: (1) merge this PR after I address existing comments (2) add test cases for ill-typed programs, which will probably go like this:

test builtin function evaluation with arbitrary terms
if evaluating a builtin function throws an exception, the test succeeds. Otherwise, verify that the arguments are indeed well-typed.

What do you think?

genArg :: (a :: *). MakeKnownIn DefaultUni Term a => TypeRep a -> Gen a

I'm not sure about this one either, unless there's a way to write things like Just HRefl <- eqTypeRep (typeRepOfTyCon tr) @[] so that we can refine a to [something], which I'm not aware of.

zliu41 · 2022-04-27T23:00:33Z

if evaluating a builtin function throws an exception, the test succeeds

Actually I think I may be wrong on this - evaluating a builtin function never throws an exception, even if the arguments are ill-typed, right? Are you suggesting that the tests should verify that, or the tests should verify that the evaluation fails with an error upon getting ill-typed arguments? Either way I feel like it can be done in a separate PR, but if it's the former, then it does belong to this ticket rather than a separate one.

effectfully · 2022-04-27T23:01:42Z

No worries, but it seems testing ill-typed programs can be a separate ticket, since this one only asks for testing that builtin functions don't throw?

Don't throw exceptions, which is orthogonal to well-typedness.

But yes, great point, testing ill-typed programs and ensuring we generate enough test cases of each kind does sound like it deserves a separate PR.

I'm not sure about this one either, unless there's a way to write things like Just HRefl <- eqTypeRep (typeRepOfTyCon tr) @[] so that we can refine a to [something], which I'm not aware of.

You know what, I won't let you keep all the fun for yourself, I'm gonna look at this particular piece and feel free to merge the current approach once ready.

effectfully · 2022-04-27T23:05:36Z

if evaluating a builtin function throws an exception, the test succeeds

Actually I think I may be wrong on this - evaluating a builtin function never throws an exception, even if the arguments are ill-typed, right?

Yes. But we had a bug once where a builtin would throw an exception. That must never happen, hence this whole endeavor.

Are you suggesting that the tests should verify that

Yes.

or the tests should verify that the evaluation fails with an error upon getting ill-typed arguments?

No. That's not true anyway, you can have an ill-typed builtin application that succeeds, we have such tests.

zliu41 · 2022-04-27T23:10:59Z

Ok, I can certainly add some tests using arbitrary Terms, which should be easy given we have genTerm.

I feel like the generators for well-typed Terms are still useful (since if we test with arbitrary Terms, probably 99% of the test cases are ill-typed), so I'll keep them, but let me know if you think otherwise.

For well-typed Terms, if we want to generate non-Constants, it seems we need to write some new generators, which I think can be done in a separate PR.

zliu41 · 2022-04-28T01:36:04Z

Added tests using arbitrary (not necessarily well-typed) arguments. I think I addressed all review comments so far.

kwxm · 2022-04-28T01:58:24Z

Three functions, VerifySignature, VerifyEcdsaSecp256k1Signature and VerifySchnorrSecp256k1Signature, are currently excluded. I need to figure out how to generate valid inputs for them.

@kwxm do you happen to know some of that? Also do the latter two have tests in the Plutus Tx compiler? We'll need to generate correct as well as incorrect input to make sure exceptions don't occur in both of these cases. It's fine to leave that for a future PR, though.

Each of the signature verification functions takes three bytestrings as arguments, and some of these have to be of specific sizes (you should get an error if they're the wrong size): see the comments in Crypto.hs. The functions take three arguments: a public key, a message, and a signature. They check that the signature's been obtained from the message using the private key corresponding to the public key, returning True if it has and False if it hasn't. For our purposes it shouldn't matter whether or not the signature is correct: the algorithms have to check every bit of the message in order to decide what the result is and there shouldn't be any difference in behaviour for correct and incorrect signatures. I think we can just generate random bytestrings of appropriate lengths and that should do the job (we won't need inputs where the signature really is valid, but we should check what happens when one or more of the inputs have the wrong length)

We already have some generators in the code that we use for generating the cost models for these functions: see here (although the generators need to be rationalised: there's a comment that says "FIXME: this is terrible), and here. We currently only have a benchmark for verifySignature, but Nikos will be adding the other signature functions in the near future.

I don't think there are any tests for these functions in the Plutus Tx compiler, but Koz Ross wrote some extremely comprehensive tests for the builtins in this file (which we should probably rename since it's not just about SECP256k1 any more). In fact that's probably a much better place to look for generators than the file I mentioned before. I think it even does show you how to generate correct signatures, so ignore what I said earlier about not needing to generate them: if we can, we probably should.

effectfully

This commit shows how to make TypeRep a -> Gen a work for constants.

I don't mind merging the PR as is, though, it looks great and it gets the job done.

michaelpj · 2022-04-28T10:31:06Z

This means that our tests need to cover the ill-typed case just like the well-typed one. We need to make sure that both the cases are present in the generated tests.

Ziyang and I discussed this a bit. I think there are at least two bits of the builtin machinery that we might want to test:

The generic handling of terms and converting them into typed values for builtins to consume.
The implementation of the builtin functions themselves.

I think it's most useful for this check to focus on the second. That is, we're really trying to test that the dentotation functions are well-behaved here, which means we can assume we already have correctly-typed input.

Checking that the term -> typed constant machinery works properly seems like a separate issue, and one we could test perhaps more directly.

michaelpj

Looks good to me! Did you test that it fails if we actually have a builtin that throws?

Also I see you already added the case of providing ill-typed arguments, which is fine.

plutus-core/plutus-core/test/Evaluation/Spec.hs

michaelpj · 2022-04-28T10:35:55Z

plutus-core/plutus-core/test/Evaluation/Spec.hs

+                (pure . pure)
+    whenJust mbErr $ \(e :: SomeException) -> do
+        let msg =
+                "Builtin function evaluation failed"


I think you can do this more simply with annotate: https://hackage.haskell.org/package/hedgehog-0.6.1/docs/Hedgehog.html#v:annotate

i.e. do something like:

annotate $ "Function: " <> display bn annotate $ "Arguments: " <> ... ...

and then ditch the exception handler. I think what will happen then is that the exception will fail the test and Hedgehog will show you the annotated values in the output.

I changed to using annotate, but didn't remove the exception handler because the annotations don't show up when the test is failed by an exception.

plutus-core/plutus-core/test/Evaluation/Spec.hs

effectfully · 2022-04-28T12:55:59Z

@michaelpj

The generic handling of terms and converting them into typed values for builtins to consume.
The implementation of the builtin functions themselves.

Checking that the term -> typed constant machinery works properly seems like a separate issue, and one we could test perhaps more directly.

With deferred unlifting both are indistinguishable parts of the runtime denotation of a builtin. One can extract a constant from a term implicitly via the ReadKnownIn instance or by manually calling asConstant in the denotation of the builtin and we'll get equal Core in the end. Trying to test the two indistinguishable ways of doing the same thing separately would only increase our chances of missing something.

To put it differently, even with immediate unlifting if the denotation of a builtin takes an Opaque, then it may call asConstant on it and you can't predict if that happens or not just by looking at the types, so this possibility needs to be accounted for and non-Constant terms need to be generated, at which point there's no point in testing asConstant separately, since this is handled by the tests here anyway.

I was too sloppy with that well-typed/ill-typed distinction. We don't care about an ifThenElse application being ill-typed: as long as the first argument is a boolean constant, the application succeeds and it doesn't matter whether the branches are of the same type. So the distinction should be between succeeding and failing applications and this is what should be tracked (in a subsequent PR as we've all agreed).

Does this all make sense?

michaelpj · 2022-04-29T09:54:08Z

Okay, I think I agree with that.

I was too sloppy with that well-typed/ill-typed distinction.

Well, ill-typed applications are one obvious class of failing applications!

…in/errors

effectfully · 2022-04-29T15:44:09Z

Well, ill-typed applications are one obvious class of failing applications!

Not necessarily! An ill-typed application can run just fine, even in the typed world (i.e. the CK machine). Here's a PR with a test. So the well-typed vs ill-typed dichotomy seems completely orthogonal to what we're trying to achieve here.

michaelpj · 2022-04-29T16:00:33Z

Not necessarily! An ill-typed application can run just fine, even in the typed world (i.e. the CK machine).

Ugh, this is hard to talk about. We don't check anything for arguments that are type variables, but we do for those that aren't. I was trying to talk about applications-that-fail-because-of-unlifting-of-constants-to-particular-types, I don't know what to call that 😅

effectfully · 2022-04-29T16:17:45Z

Yeah, ok, I think we're on the same page.

@effectfully

The idea is due to @effectfully (be2efd3)

zliu41 · 2022-04-29T18:27:05Z

Did you test that it fails if we actually have a builtin that throws?

Yeah I manually verified it. Once we make the test polymorphic in fun we can add such a test case.

zliu41 · 2022-04-29T18:27:58Z

@effectfully I've switched to your approach in be2efd3, and now we are generating arbitrary Terms for Opaque arguments, in the "success" cases, which happens to be what we wanted.

I think you also suggested, in a separate PR, generating well-typed Terms for Opaque, rather than arbitrary ones (in other words, for IfThenElse we should generate Terms of the same type for both branches), right? This seems much trickier since we don't have an existing generators for Terms of a specific type. So just want to make sure it is worth doing.

effectfully · 2022-04-29T23:17:55Z

I think you also suggested, in a separate PR, generating well-typed Terms for Opaque, rather than arbitrary ones (in other words, for IfThenElse we should generate Terms of the same type for both branches), right?

Not quite.

So I think that for SomeConstant we need to generate:

constants of the right type. I.e. if it's a (a, b) there, we need to generate a tuple. And if a and/or b happen to be type variables (which is always the case with our builtins when a tuple appears inside of SomeConstant), then elements of the tuple need to be generated with random types (for example, a Bool and an Integer, a Text and a Text -- just two completely arbitrary constants, possibly but not necessarily of the same type)
constants of the wrong type. I.e. if it's a (a, b) there, we should occasionally generate an arbitrary constant (most of which are going to be non-tuples) to test the unhappy path
arbitrary terms (most of which are going to be non-constants) to test an even more unhappy path

Similarly for monomorphic builtins we want to generate constants of the right type (most often), constants of the wrong type (rarely) and non-constants (even more rarely).

And for Opaque I think we can just use the same generator that will be implemented for SomeConstant. At least I can't come up with a reason to differentiate them in any way. This does mean that we'll get very few well-typed applications of ifThenElse, but as I'm explaining to myself above we don't care about them being well-typed anyway, we only care whether evaluation succeeds or not.

Hope this is not too messy of an explanation of what I have in mind.

zliu41 · 2022-04-30T00:03:53Z

constants of the right type.

We already generate constants of the right type, but right now everything is instantiated to Integer, and that part does need to be updated to allow instantiation to arbitrary types.

constants of the wrong type. I.e. if it's a (a, b) there, we should occasionally generate an arbitrary constant

I feel like this is covered by testing with arbitrary Terms, since an arbitrary Term already has a 10% chance to be a constant. Any particular reason that we need to test this case more?

And for Opaque I think we can just use the same generator that will be implemented for SomeConstant.

Same question as above - since Opaque is supposed to accept any Term, there seems to be even less reason to generate constants-only test cases for Opaque.

effectfully · 2022-04-30T01:24:58Z

I feel like this is covered by testing with arbitrary Terms, since an arbitrary Term already has a 10% chance to be a constant. Any particular reason that we need to test this case more?

Good point, not sure what I was thinking about.

since Opaque is supposed to accept any Term, there seems to be even less reason to generate constants-only test cases for Opaque.

It accepts any value, but a builtin is allowed to attempt to extract a constant out of it and then it can look at the type tag of that constant etc. I.e. from the point of view of these tests there doesn't seem to be any difference between SomeConstant and Opaque.

I do now realize we don't need to consider the case of constants of the wrong type separately, it'll be enough to just generate arbitrary terms and some of them will be constants making things ill-typed. Thanks for catching.

…in/errors

michaelpj · 2022-05-03T10:42:09Z

The comment on the test is good, but given how much discussion there has been here it might be worth expanding it into a longer Note? What do you think @zliu41 ? "Comprehensively generating arguments for builtin functions" or something

Test that builtin functions don't throw

74dcbc8

zliu41 requested review from michaelpj, kwxm and effectfully April 26, 2022 23:12

effectfully reviewed Apr 27, 2022

View reviewed changes

michaelpj reviewed Apr 27, 2022

View reviewed changes

Use four spaces

e61aafe

zliu41 force-pushed the builtin/errors branch from 72644f9 to 7520210 Compare April 27, 2022 20:04

Catch exceptions from evaluation

48ee475

zliu41 force-pushed the builtin/errors branch from 7520210 to 48ee475 Compare April 27, 2022 20:08

effectfully reviewed Apr 27, 2022

View reviewed changes

plutus-core/plutus-core/test/Evaluation/Spec.hs Outdated Show resolved Hide resolved

Make genArg return Some (ValueOf uni))

4f93d04

zliu41 added 3 commits April 27, 2022 18:06

Add tests using arbitrary (not necessarily well-typed) Terms

f60a53f

Move genValArg to testlib

2e9ef6b

Reorganize imports

602fdde

zliu41 requested review from michaelpj and effectfully April 28, 2022 01:36

effectfully approved these changes Apr 28, 2022

View reviewed changes

michaelpj approved these changes Apr 28, 2022

View reviewed changes

Merge branch 'master' of github.com:input-output-hk/plutus into built…

44090ca

…in/errors

zliu41 added 3 commits April 29, 2022 10:49

Replace genValArg with genConstant

765e665

The idea is due to @effectfully (be2efd3)

Use annotate

ebb8fbd

Add haddock

2d2bc10

Merge branch 'master' of github.com:input-output-hk/plutus into built…

cc80190

…in/errors

zliu41 merged commit c05ab5f into master Apr 30, 2022

zliu41 deleted the builtin/errors branch April 30, 2022 07:27

zliu41 mentioned this pull request May 2, 2022

Generate arbitrary constant for SomeConstant #4573

Closed

effectfully mentioned this pull request May 4, 2022

[Epic] Internals of builtins #4306

Open

Test that builtin functions don't throw #4555

Test that builtin functions don't throw #4555

Conversation

zliu41 commented Apr 26, 2022 • edited Loading

effectfully left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelpj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

effectfully commented Apr 27, 2022 • edited Loading

zliu41 commented Apr 27, 2022

zliu41 commented Apr 27, 2022

effectfully commented Apr 27, 2022

effectfully commented Apr 27, 2022

zliu41 commented Apr 27, 2022

zliu41 commented Apr 28, 2022

kwxm commented Apr 28, 2022 • edited Loading

effectfully left a comment

Choose a reason for hiding this comment

michaelpj commented Apr 28, 2022

michaelpj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

effectfully commented Apr 28, 2022 • edited Loading

michaelpj commented Apr 29, 2022

effectfully commented Apr 29, 2022

michaelpj commented Apr 29, 2022

effectfully commented Apr 29, 2022

zliu41 commented Apr 29, 2022

zliu41 commented Apr 29, 2022

effectfully commented Apr 29, 2022

zliu41 commented Apr 30, 2022 • edited Loading

effectfully commented Apr 30, 2022

michaelpj commented May 3, 2022

zliu41 commented Apr 26, 2022 •

edited

Loading

effectfully commented Apr 27, 2022 •

edited

Loading

kwxm commented Apr 28, 2022 •

edited

Loading

effectfully commented Apr 28, 2022 •

edited

Loading

zliu41 commented Apr 30, 2022 •

edited

Loading