-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for Restricted Sequences #11
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that modelling tuples using sequences has a number of severe drawbacks. In particular (a) the components of the tuple can only be single items (not arbitrary values), and (b) there is no way to refer to the components of the tuple by name, only by position. In addition (c), sequences do not have an intrinsic operation in XQuery to replace one item of the sequence by a different item, and such an operation is often needed for tuples; and (d) sequences support many operations that are inappropriate for tuples, such as iterating over the members or selecting a subsequence. I think it is much better to represent tuples as maps, which do not have any of these disadvantages (at least to anything like the same extent).
LATER: One other disadvantage of representing complex numbers (say) as sequences is that you then can't have a sequence of complex numbers. I agree that for very simple tuples such as complex numbers, positional access to components works just as well as named access. But (a) named access is more scaleable to more complex problems, and (b) if you're going to use positional access, then surely using arrays rather than sequences gives much more flexibility.
I think it is useful to have both a named tuple type based on maps like you are suggesting and have implemented in Saxon, and a sequence based tuple type. Regarding the examples I have mentioned in this proposal, some of them may be better represented as named tuples (e.g. the point and complex/rational numbers). It depends on the library implementor on what design choices they make, like the choice of sequence or array (both of which have different advantages and disadvantages). In addition to being able to better type-check existing code that makes use of tuple sequences (like the MarkLogic Having to specify these functions as named/map tuples would make it harder to convert to efficient assembly instructions. Regarding the disadvantages you mention: (a) I don't see being limited to single items is a disadvantage here -- the examples I gave in this proposal do not have variable numbers of items; (b) Again, I don't see only being able to refer to the items by position instead of name a disadvantage here, especially when you allow the possibility of optimising the code the query runs; (c) Again, I don't see being able to replace items as a disadvantage -- they are an immutable value, so the modified version will be a new value returned from the expression/function. For example:
|
One other disadvantage of representing complex numbers (say) as sequences is that you then can't have a sequence of complex numbers. I agree that for very simple tuples such as complex numbers, positional access to components works just as well as named access. But (a) named access is more scaleable to more complex problems, and (b) if you're going to use positional access, then surely using arrays rather than sequences gives much more flexibility. |
These are all design decisions that a developer writing XPath/XQuery code will need to make regardless of whether or not this proposal is accepted. This proposal does not change the behaviour of sequences with regards to flattening. It is designed to provide better static checking in two cases:
In the current XPath/XQuery syntax, a user can already specify the common type of each item in a sequence and the cardinality (zero or one, zero or more, only one, one or more). This proposal is just expanding on that so an XPath/XQuery processor can provide better static checking. |
I think that with the XQuery 1.0 / XPath 2.0 data model this would be a useful enhancement, but I think the use cases for it are obsoleted by maps and arrays. There's an ambiguity in the grammar that needs to be sorted out. In 3.1, |
I am not opposed to this, but I wonder if we need to so far as adding explicit types - #8 (comment) |
@michaelhkay I'm open to having a different syntax that does not conflict. Maybe something like:
I'm thinking also in terms of supporting another proposal for union item types:
This would have the advantage of allowing sequence types in the It may also be possible to drop the initial sequence type:
Using your predicate type syntax, these would be something like:
|
For a union of sequence types it would be neat to reuse the SequenceTypeUnion syntax from typeswitch -- specifically
Because a SequenceType can appear in "treat as", it has to follow some syntactic constraints and this is best achieved by sticking to the "keyword(details)" convention. I think the term "union" is strongly associated with XSD union types, so I would go for the syntax "anyOf(" SequenceTypeUnion ")" |
I like the idea of the
|
…d sequence' and better clarify the intention of the proposal.
I've updated this proposal based on feedback. The text for this version is at https://github.com/expath/xpath-ng/blob/4cac38e54646fe6486b85b2635168d681a5fba5a/restricted-sequences.md. I'll add a separate proposal for the |
Thanks for this. Sorry if the comments are fairly lengthy, but it tends to be the case that the closer you get to agreement in principle, the more reviewers start noticing the edge cases. Comments on latest version. Substantive comments where further technical work (rather than editorial work) is needed highlighted in bold.
|
Perhaps the subtype(A,B) rules can be refactored as follows. A sequence type constrains: (a) the permitted cardinality of the sequence, as a (possibly infinite) set of non-negative integers. For empty-sequence(), the permitted cardinality is {0}, for xs:error the permitted cardinality is {}, for any other item type it depends on the occurrence indicator: {1}, {0..infinity}, {1..infinity}, or {0,1}; for a restricted sequence type it is the number of component item types. (b) the required item type of the Nth item in the sequence. For empty-sequence() and xs:error this is xs:error (for all N), for a restricted sequence type it is the Nth component item type, for an item type with occurrence indicator, it is the item type (for all N). subtype(A,B) is true if and only if both of the following are true: (a) the permitted cardinality of A is a subset of the permitted cardinality of B (b) Let Ai be the required item type of item i in A and let Bi be the required item type of i in B; for all i in the permitted cardinality of A, itemtype-subtype(Ai, Bi) is true. Of course, because this involves infinite sets, this doesn't provide a computable algorithm for determining subtype(A, B), but I believe it provides an adequate mathematical specification. |
Thanks for your detailed response. Regarding point 6, that was intended for things like:
here Would it make sense to write up something like below as a separate proposal (so the new sequence type and subtype judgementI wonder if it makes sense to revise how these rules are structured to account for all the new types. Maybe adapting and extending something like https://github.com/rhdunn/xquery-intellij-plugin/blob/master/docs/XQuery%20IntelliJ%20Plugin%20Data%20Model.md#214-part-4-sequences, which I have been documenting to define how I am intending on approaching static analysis in my plugin. This would have the advantage of removing the Those rules define Sequences as Part 4 of the type system (section 2.7.4 of the XQuery and XPath Data Model) as a lower bound, upper bound, and item type, where the upper and lower bounds form the cardinality. I then define 3 type operations:
This proposal could then be implemented such that the upper bound of the resulting sequence type is the number of |
Yes, I understand what you're saying about inferring types, but it doesn't need to be said. It's no different from inferring a type for Yes, in reviewing your other proposal I also came to the conclusion that it's worth refactoring the section on subtype judgements. Looking at your IntelliJ page, you've defined the cardinality constraint in terms of a range of integers, whereas I did it as a set of integers; the advantage of using a set of integers is that it allows the empty set, which is what xs:error gives you. With "restricted sequence types" we have the new concept that the required item type becomes dependent on the position of the item in the sequence, and I think the best way to handle that is to make it this dependency a general property of sequence types, and then make the dependency trivial for existing sequence types. |
Note that once we establish the idea that a SequenceType determines a required item type for the Nth item in a matching sequence, we can use this concept in the function conversion rules. Actually this isn't as easy as I first thought. Suppose the required type is But what are the conditions under which we atomize, given that the required type could also be |
@michaelhkay I have updated the proposal to address your suggestions. See https://github.com/expath/xpath-ng/blob/b9800b9882ded23d812641ef82209cc59861d6cc/restricted-sequences.md for the non-diff version of the latest changes (version 3 of the proposal). I have updated the I haven't looked at your last comment regarding attributes and atomization yet. |
This adds the tuple sequence type concept to
SequenceType
.