Synopsis 7: Lists and Iteration
Created: 07 September 2015
Last Modified: 07 September 2015
Version: 1
Perl 6 provides a wide array of list-related features, including eager, lazy, and parallel evaluation of sequences of values, and arrays offering compact and multi-dimensional storage. Laziness in particular needs careful handling, so as to provide the power advanced users desire while not creating surprises for typical language users who have the (reasonable) expectation that an assignment into an array will have immediate effect. Additionally, it is important to give the programmer control of when values will and won't be retained. Finally, all of this needs to be done in a way that provides the convenience that a Perl is expected to provide, while still having a model that can be understood through understanding a small number of rules.
In Perl 6, we use the term "sequence" to refer to something that can, when requested, produce a sequence of values. Of note, it can only be asked to produce them once. We use the term "list" to refer to things that hold (and so remember) values. There are various concrete types that represent various kinds of list and sequence with different semantics:
(1, 2, 3) # a List, the simplest kind of list
[1, 2, 3] # an Array, a list of (assignable) Scalar containers
|(1, 2) # a Slip, a list which flattens into a surrounding List
$*IN.lines # a Seq, a sequence that can be processed serially
(^1000).race # a HyperSeq, a sequence that can be processed in parallel
The @
sigil in Perl indicates "these", while $
indicates "the". This kind of plural/single distinction shows up in various places in the language, and much convenience in Perl comes from it. Flattening is the idea that an @
-like thing will, in certain contexts, have its values automatically incorporated into the surrounding list. Traditionally this has been a source of both great power and great confusion in Perl. Perl 6 has been through a number of models relating to flattening during its evolution, before settling on a straightforward one known as the "single argument rule".
The single argument rule is best understood by considering the number of iterations that a for
loop will do. The thing to iterate over is always treated as a single argument to the for
loop, thus the name of the rule.
for 1, 2, 3 { } # List of 3 things; 3 iterations
for (1, 2, 3) { } # List of 3 things; 3 iterations
for [1, 2, 3] { } # Array of 3 things (put in Scalars); 3 iterations
for @a, @b { } # List of 2 things; 2 iterations
for (@a,) { } # List of 1 thing; 1 iteration
for (@a) { } # List of @a.elems things; @a.elems iterations
for @a { } # List of @a.elems things; @a.elems iterations
The first two are equivalent because parentheses do not actually construct a list, but only group. It is the infix:<,>
operator that forms a list. The third also performs three iterations, since in Perl 6 [...]
constructs an Array
but does not wrap it into a Scalar
container. The fourth will do two iterations, since the argument is a list of two things; that they both happen to have the @
-sigil does not, alone, lead to any kind of flattening. The same goes for the fifth; infix:<,>
will happily form a list of one thing.
The single argument rule does respect Scalar
containers. Therefore:
for $(1, 2, 3) { } # List in a Scalar; 1 iteration
for $[1, 2, 3] { } # Array in a Scalar; 1 iteration
for $@a { } # Array in a Scalar; 1 iteration
The single argument rule is implemented consistently throughout the language. For example, consider the append
method:
@a.append: 1, 2, 3; # appends 3 values to @a
@a.append: [1, 2, 3]; # appends 3 values to @a
@a.append: @b; # appends @b.elems values to @a
@a.append: @b,; # same, trailing comma doesn't make > 1 argument
@a.append: $(1, 2, 3); # appends 1 value (a List) to @a
@a.append: $[1, 2, 3]; # appends 1 value (an Array) to @a
Additionally, the list constructor (the infix:<,>
operator) and the array composer (the [...]
circumfix) follow the rule:
[1, 2, 3] # Array of 3 elements
[@a, @b] # Array of 2 elements
[@a, 1..10] # Array of 2 elements
[@a] # Array with the elements of @a copied into it
[1..10] # Array with 10 elements
[$@a] # Array with 1 element (@a)
[@a,] # Array with 1 element (@a)
[[1]] # Same as [1]
[[1],] # Array with a single element that is [1]
[$[1]] # Array with a single element that is [1]
The only one of these that is likely to provide a surprise is [[1]]
, but it is deemed sufficiently rare that it does not warrant an exception to the very general single argument rule.
A List
is an immutable, potentially infinite, list of values. The simplest way to form a List is with the infix:<,>
operator:
1, 2, 3
A List
can be indexed and, provided it is finite, asked for its number of elements:
say (1, 2, 3)[1]; # 2
say (1, 2, 3).elems; # 3
As it is immutable, it is not possible to push
, pop
, shift
, unshift
, or splice
a List
. The reverse
and rotate
operations return new List
s.
While a List
itself is immutable, it may freely contain mutable things, including Scalar
containers. Thus:
my $a = 2;
my $b = 4;
($a, $b)[0]++;
($a, $b)[1] *= 2;
say $a; # 3
say $b; # 8
Trying to assign to an immutable value in a List
is an error, however.
(1, 2, 3)[0]++; # Dies: Cannot assign to an immutable value
The Slip
type is a subclass of List
. A Slip
will have its values incorporated into a surrounding List
.
(1, (2, 3), 4).elems # 3
(1, slip(2, 3), 4).elems # 4
It is possible to coerce a List
to a Slip
, so the above can also be written as:
(1, (2, 3).Slip, 4).elems # 4
This is a common way to get flattening in places it will not magically take place:
my @a = 1, 2, 3;
my @b = 4, 5;
for @a.Slip, @b.Slip { } # 5 iterations
It's also a bit verbose, which is why the prefix:<|>
operator will, anywhere other than a function call argument list, do a Slip
coercion:
my @a = 1, 2, 3;
my @b = 4, 5;
for |@a, |@b { } # 5 iterations
It can also be useful in forms such as:
my @prefixed-values = 0, |@values;
Where the single argument rule would otherwise make @prefixed-values
have two elements, the zero and @values
.
The Slip
type is also respected by map
, gather
/take
, and lazy loops. It is the way a map
can place multiple values into its result stream:
my @a = 1, 2;
say @a.map({ $_ xx 2 }).elems; # 2
say @a.map({ |($_ xx 2) }).elems; # 4
An Array
is a subclass of List
that places values assigned to it into Scalar
containers, meaning they can be mutated. It is the default type that an @
-sigil variable gets.
my @a = 1, 2, 3;
say @a.WHAT; # (Array)
@a[1]++;
say @a; # 1 3 3
In the absence of a shape specification, it will grow automatically.
my @a;
@a[5] = 42;
say @a.elems; # 6
An Array
supports push
, pop
, shift
, unshift
, and splice
.
Assignment to an array is eager by default, and creates a new set of Scalar containers:
my @a = 1, 2, 3;
my @b = @a;
@a[1]++;
say @b; # 1, 2, 3
Note that the [...]
Array
constructor is equivalent to creating and then assigning to an anonymous Array
, and so has the same semantics with regard to eagerness and fresh containers.
A Seq
is a one-shot producer of values. Most list processing operations return a Seq
, as do most synchronous sources of multiple values.
say (1, 2, 3).map(* + 1).^name; # Seq
say (1, 2 Z 'a', 'b').^name; # Seq
say (1, 1, * + * ... *).^name; # Seq
say $*IN.lines.^name; # Seq
Since a Seq
will not by default remember its values, it can only be consumed once. For example, if a Seq
is stored:
my \seq = (1, 2, 3).map(* + 1);
Then only one attempt to iterate it will work; subsequent attempts will die as the values have already been consumed:
for seq { .say } # 2\n3\n4\n
for seq { .say } # Dies: This Seq has already been iterated
This means you can be confident that a loop going over a file's lines:
for open('data').lines {
.say if /beer/;
}
Will not be retaining the lines of the file in memory. Additionally, it is easy to set up processing pipelines that also will not retain all of the lines in memory:
my \lines = open('products').lines;
my \beer = lines.grep(/beer/);
my \excited = beer.map(&uc);
.say for excited;
However, any attempt to re-use lines
, beer
, or excited
will result in an error. This program is equivalent in performance to:
.say for open('products').lines.grep(/beer/).map(&uc);
But provides a chance to name the stages. Note that it's possible to use Scalar
variables instead, but the single argument rule means that the final loop would have to be:
.say for |$excited;
Assigning a Seq
to an Array
will - so long as the sequence is not marked lazy - eagerly perform the operation and store the results into the Array
. Therefore, there are no surprises to anyone writing:
my @lines = open('products').lines;
my @beer = @lines.grep(/beer/);
my @excited = @beer.map(&uc);
.say for @excited;
Re-using any of these arrays will work out fine. Of course, the memory behavior of this program is radically different, and it will be slower due to all of the extra Scalar
containers created (resulting in extra garbage collection) and poor locality of reference (we have to talk about the same string many times over the programs lifetime).
Occasionally it can be useful to request that a Seq
cache itself. This can be done by calling the cache
method on a Seq
, which makes a lazy List
from the Seq
and returns it. Subsequent calls to cache
will return the same lazy list. Note that the first call to cache
counts as consuming the Seq
, and so it will not work out if any prior iteration has taken place, and any later attempt to iterate the Seq
after calling cache
will also fail. It is only .cache
which may be called more than once.
A Seq
does not do the Positional
role like a List
. Therefore, it can not be bound to an @
-sigil variable:
my @lines := $*IN.lines; # Dies
One consequence of this is that, naively, you could not pass a Seq
as an argument to be bound to an @
-sigil parameter:
sub process(@data) {
}
process($*IN.lines);
This would be rather too inconvenient. Therefore, the signature binder (which actually uses ::=
assignment semantics rather than :=
) will spot failure to bind the <@>-sigil parameter, and then check if the argument does the PositionalBindFailover
role. If it does, then it will call the cache
method on the argument and bind the result of that instead.
Both Seq
and List
, along with various other types in Perl 6, do the Iterable
role. The primary purpose of this role is to promise that an iterator
method is available. The average Perl 6 user will rarely need to care about the iterator
method and what it returns.
The secondary purpose of Iterable
is to mark out things that will flatten on demand, when the flat
method or function is used on them.
my @a = 1, 2, 3;
my @b = 4, 5;
for flat @a, @b { } # 5 iterations
say [flat @a, @b].elems; # 5
Another use of flat
is to flatten nested List
structure. For example, the Z
(zip) operator produces a List
of List
:
say (1, 2 Z 'a', 'b').perl; # ((1, "a"), (2, "b")).Seq
A flat
can be used to flatten these, which is useful in conjunction with a for
loop using a pointy block with multiple parameters:
for flat 1, 2 Z 'a', 'b' -> $num, $letter { }
Note that flat
respects Scalar
containers, and so:
for flat $(1, 2) { }
Will only do one iteration. Remember that an Array
stores everything in a Scalar
container, and so flat
on an Array
- short of weird tricks with binding - will always be the same as iterating over the Array
itself. In fact, the flat
method on an Array
returns identity.
Jonathan Worthington <jnthn@jnthn.net>