-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for Sequence, Map, and Array Decomposition #8
base: master
Are you sure you want to change the base?
Conversation
+1 for the proposal, looks good. If tuple arrays are returned, I would be in favor of having the array syntax. This would make it easier to process sequences of arrays: let ($array1, $array2) := ([1,2], [1,2])
return ... I guess it shouldn’t apply to context item declarations? I’m not sure either if it makes sense for group by clauses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's necessary to go further and define tuples as a type: using sequences and arrays to represent tuples of values has many problems and this proposal only solves one of them. Given the introduction of maps, I think that it's better to represent tuples of values as maps (as the standard function library often does in the case of the "options" parameters of functions), and this proposal doesn't allow decomposing assignment in this case.
Note that in the introduction, the terms "fixed length sequence" and "fixed length array" are confusing, because it suggests that there are sequences and arrays whose length is not fixed. This is not the case; a sequence (and an array) have a length which is an intrinsic property of the sequence and can never be changed, since sequences are immutable.
I plan on defining a separate proposal for defining the type of a tuple sequence using the formal semantics style syntax -- I don't think that just because maps are available, syntax and extensions to support sequences and arrays should not be proposed. There are existing functions that make use of fixed length sequence values, and it may be easier to write functions that accept/return sequences/arrays than maps. I've listed some examples in my proposal (sincos, muldiv, points, and complex/rational numbers). What I meant by "fixed length" sequence/array is where the size of the sequence/array does not change depending on context. For example, a 2D point will always have two items. A counter example would be a function that doubles the values of a sequence -- the length of the sequence here is variable (not fixed). I'm happy to use different terminology if the terms in this proposal are confusing. Having said that, provided that there are a matching number of items in the sequence/array and variables being assigned to that, it should not matter. The behaviour of assigning more variables than there are items in the sequence/array is defined in the proposal. The proposal should define the behaviour of assigning fewer variables than there are items in the sequence/array. Providing a proposal for decomposing map based tuples (named tuples?) is something I would be interested in, but should be a separate proposal. As a rough idea, using a map-like syntax analogous to declaring maps, something like:
|
I am watching this with interest. From my perspective we don't necessarily need tuple sequence types or tuple array types, rather I see the decomposition as just syntactic sugar. I do like the idea of varying syntax for sequence and array, e.g.: let ($x, $y) := (1.1, 2.2) let [$x, $y] := [1.1, 2.2] |
I agree with Adam’s point of view: I regard the extension of the syntax as a nice addition, but I could live without new tuple types. |
To be clear, this specific proposal is not intending to add any new types. It is just about the decomposition of sequence and array values. I will update the proposal to make this clearer, and to add a section for decomposition of map values. I'll also rename the file and pull request to reflect these changes. |
@michaelhkay how do you feel about this PR just being syntactic sugar for the time being? |
I think the Need to see the detailed semantics, e.g. for the case where the sequence/array has a different number of items from the number of variables. There are also some syntax details to sort out:
The extension to maps/tuples doesn't work for me. The proposed syntax offers no benefits over |
It's not very pretty, but the following would parse more cleanly:
and then perhaps map/tuple assignment could be
|
@michaelhkay I actually prefer your new syntax, less |
Here's a suggestion for the semantics:
Amend the existing text: If a let clause contains multiple variables, it is semantically equivalent to multiple let clauses, each containing a single variable. In particular: (a) the clause
is semantically equivalent to the following sequence of clauses:
(b) a sequence-decomposition
(but the expression If the sequence contains more items than the number of variables being bound, excess items are ignored. If the sequence contains fewer items than the number of variables being bound, excess variables are bound to an empty sequence. (c) an array-decomposition
(again, the expression A type error [XPTY0004] is raised if the result of evaluating (d) a map-decomposition
(again, the expression In the case where the variable name is a QName q, the equivalence is A type error [XPTY0004] is raised if the result of the expression is not a map. [[Assuming map-based tuples are introduced], a type error [XPTY0004] MAY be raised if the processor is able to establish that the static type of expr is a tuple type and that x (etc) is not one of the permitted key names for that tuple type.] In other cases, if the map does not contain an entry with the specified key, the corresponding variable is bound to an empty sequence. Unreferenced entries in the map are ignored. |
This looks sound and solid. Just one thing: Maybe we should not simply ignore returned values that cannot be bound but rather raise an error. Swallowed data may result in erroneous code. Thinking more about this, maybe we should indeed find different solutions for sequences, arrays and maps, as the three data structures have different semantics anyway:
|
Yes, I toyed with allowing |
Maybe we can think about use cases for which ignoring returned results is a better solution than raising an error? In other words, when does a user create results and expect parts of it to be ignored? I have some sympathy for the assymetry between arrays and sequences, as the data structures are assymetric one way or the other (mostly because of the decision in XQ31 that a supplied array index must be larger than 0 and must not exceed the array size). Moreover, my impression is that arrays and sequences are used quite differently in practice. As arrays are not implicitly flattened, it would possibly come as a surprise if we created something like a tail result for arrays. However, we could also provide explicit semantics for binding the tail of a sequence or even an array to the last variable. In Python, let $(head as xs:string, tail as xs:string*...) := ('head', 't', 'a', 'i', 'l')
return string-join($tail) |
…ces to the XPath/XQuery sequence and array types.
…Michael Kay and Christian Gruen.
…composition, per consensus.
I have updated the proposal to address the above feedback, use the new syntax, and add a possible grammar. The revised text is viewable at https://github.com/expath/xpath-ng/blob/bc6cb1b579d688ba0088abfe0e73b7e633f964aa/sequence-map-array-decomposition.md (this includes the TypeDeclaration parsing fix pushed below). |
No description provided.