Synopsis 7: Lists and Iteration
Created: 12 July 2012
Last Modified: 29 May 2013
Version: 2
Lists and arrays have always been one of Perl's fundamental data types, and Perl 6 is no different. However, lists in Perl 6 have been greatly extended to accommodate lazy lists, infinite lists, lists of mutable and immutable elements, typed lists, flattening behaviors, and so on. So where lists and arrays in Perl 5 tended to be finite sequences of scalar values, in Perl 6 we have additional dimensions of behavior that must be addressed.
The List
class is the base class for dealing with other types of lists, including Array
. To the programmer, a List
is a potentially lazy and infinite sequence of elements. A List
is mutable, in that one can manipulate the sequence via operations such as push
, pop
, shift
, unshift
, splice
, etc. A List
's elements may be either mutable or immutable.
List
objects are Positional
, meaning they can be bound to array variables and support the postfix .[]
operator.
Lists are also lazy, in that the elements of a List
may come from generator functions (called Iterator
s) that produce elements on demand.
An Array
is simply a List
in which all of the elements are held in scalar containers. This allows assignment to the elements of the array.
The comma operator (infix:<,>
) creates Parcel
objects. These should not be confused with lists; a Parcel
represents a raw syntactic sequence of elements. A Parcel
is immutable, although the elements of a Parcel
may be either mutable or immutable.
The name "Parcel" is derived from the phrase "parenthesis cell", since many Parcel
objects appear inside of parentheses. However, except for the empty parcel, it's the comma operator that creates Parcel
objects.
() # empty Parcel
(1) # an Int
(1,2) # a Parcel with two Ints
(1,) # a Parcel with one Int
A Parcel
is also Positional
, and uses flattening context for list operations such as .[]
and .elems
. See "Flattening contexts" below. For raw access to the arguments without flattening, you may use .arg($n)
instead of .[$n]
, and .args
instead of .elems
.
List
and Parcel
objects can have other container objects as elements. In some contexts we want to interpolate the values of container objects into the surrounding List
or Parcel
, while in other contexts we want any subcontainers to be preserved. Such interpolation is known as "flattening".
The Iterable
type is performed by container and generator objects that will interpolate their values in flattening contexts. Both List
and Parcel
are Iterable
. Iterable
implies support for .iterator
, which returns an object capable of producing the elements of the Iterable
.
Objects held in scalar containers are never interpolated in flattening context, even if the object is Iterable
.
my @a = 3,4,5;
for 1,2,@a { ... } # five iterations
my $s = @a;
for 1,2,$s { ... } # three iterations
Here, both $s
and @a
refer to the same underlying Array
object, but the presence of the scalar container prevents $s
from being flattened into the for
loop. The .list
or .flat
method may be used to restore the flattening behavior:
for 1,2,$s.list { ... } # five iterations
for 1,2,@($s) { ... } # five iterations
Conversely, the .item
or $()
contextualizer can be used to prevent an Iterable
from being interpolated:
my @b = 1,2,@a; # @b has five elements
my @c = 1,2,@a.item; # @c has three elements
my @c = 1,2,$(@a); # same thing
An Iterator
is an object that is able to generate or produce elements of a list (called "reification"). Because a single Iterator
may be directly or indirectly referenced by multiple container objects simultaneously, Iterator
s provide an immutable view of the sequence of elements produced, leaving it up to containers to provide any mutability.
The .reify($n)
method requests an Iterator
to return a Parcel
consisting of at least $n
reified elements, followed by any additional iterators needed for the remaining elements in the sequence. For example:
my $r = 10..50;
say $r.iterator.reify(5).perl; # (10, 11, 12, 13, 14, 15..50)
Iterators are allowed to return more or fewer elements than requested by $n
; it's up to the caller to properly handle any shortage or excess. (In general an iterator is expected to always reify at least one element, however.) If $n
is *
, then the iterator is asked to generate elements according to whatever is most natural for the thing being iterated. For a Range
this might be a substantial number (or even all) of the elements of the range; for a filehandle it might read a single record; for a List
it may return only the non-lazy portion of the list.
Once .reify
is called on an iterator, the iterator is expected to return the same results for all subsequent calls to .reify
. This is true even if the $n
argument is different from one call to the next. The general pattern is often something like:
class SomeIter is Iterator {
has $!reified;
method reify($n = 1) {
unless $!reified.defined {
# generate return Parcel and bind to $!reified
}
$!reified;
}
}
Because lists in Perl 6 can be lazy, they can also be infinite. Each Iterator
must provide an .infinite
method, which returns a value indicating the knowable finiteness of the iteration:
.infinite Meaning
----------------------------------------------
True iteration is known to be infinite
False iteration is known to be finite
Mu finiteness isn't currently known
As an example, Range
iterators can generally know finiteness simply by looking at the endpoint of the Range
. The iterator for the infix:<...>
sequence operator treats any sequence ending in *
as being "known infinite", all other ...
sequences have unknown finiteness. In the general case it's not possible for loop iterators to definitively know if the sequence will be finite or infinite. (Conjecture: There will be a syntax or mechanism for the programmer to indicate that such sequences are to be treated as known finite or known infinite.)
Laziness is one of the primary virtues of a Perl programmer; it is also one of the virtues of Perl 6. However, it's also possible to succumb to false laziness, so there are times when Perl 6 chooses to be eager instead of lazy.
Perl 6 defines four levels of laziness:
- Strictly Lazy
-
Does not evaluate anything unless explicitly required by the caller, including not traversing non-lazy objects. This behavior generally occurs only by a pragma or by explicit lazy primitives.
- Mostly Lazy
-
Try to obtain available elements without causing eager evaluation of other lazy objects. However, the implementation is allowed to do batching for efficiency. The programmer must not rely on circular side effects when the implementation is working ahead like this, or the results will be indeterminate. (However, this does not rule out pure definitional circularity; earlier array values may be used in the definition of subsequent values.)
- Mostly Eager
-
Obtain all leading items that are not known to be infinite. The remainder of the items are left for lazy evaluation. As with the "mostly lazy" case, the programmer must not depend on circular side effects in any situation where the implementation is allowed to work ahead. Note that there are no language-defined limits on the amount of conjectural evaluation allowed, up to the size of available memory; however, an implementation may allow the arbitrary limitation of workahead by use of pragmas.
In any case there should be no profound semantic differences between "mostly lazy" and "mostly eager". These levels are primarily just hints to the implementation as to whether you are likely to want to use all the values in the list. Nothing transactional should be allowed to depend on the difference.
- Strictly Eager
-
Obtain all items, failing immediately with an appropriate message if asked to evaluate any data structures known to be infinite. (Data structures that are effectively infinite but not provably so will likely exhaust memory instead.)
- List assignment; my @a = @something;
-
In order to provide p5-like behavior in list assignment, it is performed at the Mostly Eager level. This means that if you do
my @a = foo();
it will eagerly evaluate the return value from
foo()
to place elements into@a
, stopping only when encountering something that is "known infinite" or upon reaching the end of the sequence. for
,map
,grep
, etc.-
The
for
loop and looping routines such asmap
,grep
, etc. are Mostly Lazy by default, but will be eager when evaluated in sink context. (Note, however, that we forcefor
,while
,until
,repeat
andloop
loops to always be in sink context when used as a statement within a statementlist.) - Slurpy parameters
-
Slurpy parameters are Mostly Lazy.
- Feed operators
-
The feed operator is strictly lazy, meaning that no operation should be performed before the user requests any element. That's how
my @a <== grep { ... } <== map { ... } <== grep { ... } <== 1, 2, 3
is completely lazy, even if 1,2,3 is a fairly small known compact list.
Patrick R. Michaud <pmichaud@pobox.com