Skip to content

Latest commit

 

History

History
272 lines (197 loc) · 9.87 KB

S07-lists.pod

File metadata and controls

272 lines (197 loc) · 9.87 KB

TITLE

Synopsis 7: Lists and Iteration

VERSION

Created: 12 July 2012

Last Modified: 29 May 2013
Version: 2

Overview

Lists and arrays have always been one of Perl's fundamental data types, and Perl 6 is no different. However, lists in Perl 6 have been greatly extended to accommodate lazy lists, infinite lists, lists of mutable and immutable elements, typed lists, flattening behaviors, and so on. So where lists and arrays in Perl 5 tended to be finite sequences of scalar values, in Perl 6 we have additional dimensions of behavior that must be addressed.

The List type

The List class is the base class for dealing with other types of lists, including Array. To the programmer, a List is a potentially lazy and infinite sequence of elements. A List is mutable, in that one can manipulate the sequence via operations such as push, pop, shift, unshift, splice, etc. A List's elements may be either mutable or immutable.

List objects are Positional, meaning they can be bound to array variables and support the postfix .[] operator.

Lists are also lazy, in that the elements of a List may come from generator functions (called Iterators) that produce elements on demand.

The Array type

An Array is simply a List in which all of the elements are held in scalar containers. This allows assignment to the elements of the array.

The Parcel type

The comma operator (infix:<,>) creates Parcel objects. These should not be confused with lists; a Parcel represents a raw syntactic sequence of elements. A Parcel is immutable, although the elements of a Parcel may be either mutable or immutable.

The name "Parcel" is derived from the phrase "parenthesis cell", since many Parcel objects appear inside of parentheses. However, except for the empty parcel, it's the comma operator that creates Parcel objects.

()       # empty Parcel
(1)      # an Int
(1,2)    # a Parcel with two Ints
(1,)     # a Parcel with one Int

A Parcel is also Positional, and uses flattening context for list operations such as .[] and .elems. See "Flattening contexts" below. For raw access to the arguments without flattening, you may use .arg($n) instead of .[$n], and .args instead of .elems.

Flattening contexts, the Iterable type

List and Parcel objects can have other container objects as elements. In some contexts we want to interpolate the values of container objects into the surrounding List or Parcel, while in other contexts we want any subcontainers to be preserved. Such interpolation is known as "flattening".

The Iterable type is performed by container and generator objects that will interpolate their values in flattening contexts. Both List and Parcel are Iterable. Iterable implies support for .iterator, which returns an object capable of producing the elements of the Iterable.

Objects held in scalar containers are never interpolated in flattening context, even if the object is Iterable.

my @a = 3,4,5;
for 1,2,@a { ... }        # five iterations

my $s = @a;
for 1,2,$s { ... }        # three iterations

Here, both $s and @a refer to the same underlying Array object, but the presence of the scalar container prevents $s from being flattened into the for loop. The .list or .flat method may be used to restore the flattening behavior:

for 1,2,$s.list { ... }   # five iterations
for 1,2,@($s) { ... }     # five iterations

Conversely, the .item or $() contextualizer can be used to prevent an Iterable from being interpolated:

my @b = 1,2,@a;           # @b has five elements
my @c = 1,2,@a.item;      # @c has three elements
my @c = 1,2,$(@a);        # same thing

Iterators

An Iterator is an object that is able to generate or produce elements of a list (called "reification"). Because a single Iterator may be directly or indirectly referenced by multiple container objects simultaneously, Iterators provide an immutable view of the sequence of elements produced, leaving it up to containers to provide any mutability.

The .reify method

The .reify($n) method requests an Iterator to return a Parcel consisting of at least $n reified elements, followed by any additional iterators needed for the remaining elements in the sequence. For example:

my $r = 10..50;
say $r.iterator.reify(5).perl;  # (10, 11, 12, 13, 14, 15..50)

Iterators are allowed to return more or fewer elements than requested by $n; it's up to the caller to properly handle any shortage or excess. (In general an iterator is expected to always reify at least one element, however.) If $n is *, then the iterator is asked to generate elements according to whatever is most natural for the thing being iterated. For a Range this might be a substantial number (or even all) of the elements of the range; for a filehandle it might read a single record; for a List it may return only the non-lazy portion of the list.

Once .reify is called on an iterator, the iterator is expected to return the same results for all subsequent calls to .reify. This is true even if the $n argument is different from one call to the next. The general pattern is often something like:

class SomeIter is Iterator {
    has $!reified;

    method reify($n = 1) {
        unless $!reified.defined {
            # generate return Parcel and bind to $!reified
        }
        $!reified;
    }
}

The .infinite method

Because lists in Perl 6 can be lazy, they can also be infinite. Each Iterator must provide an .infinite method, which returns a value indicating the knowable finiteness of the iteration:

.infinite     Meaning
----------------------------------------------
True          iteration is known to be infinite
False         iteration is known to be finite
Mu            finiteness isn't currently known

As an example, Range iterators can generally know finiteness simply by looking at the endpoint of the Range. The iterator for the infix:<...> sequence operator treats any sequence ending in * as being "known infinite", all other ... sequences have unknown finiteness. In the general case it's not possible for loop iterators to definitively know if the sequence will be finite or infinite. (Conjecture: There will be a syntax or mechanism for the programmer to indicate that such sequences are to be treated as known finite or known infinite.)

Levels of laziness

Laziness is one of the primary virtues of a Perl programmer; it is also one of the virtues of Perl 6. However, it's also possible to succumb to false laziness, so there are times when Perl 6 chooses to be eager instead of lazy.

Perl 6 defines four levels of laziness:

Strictly Lazy

Does not evaluate anything unless explicitly required by the caller, including not traversing non-lazy objects. This behavior generally occurs only by a pragma or by explicit lazy primitives.

Mostly Lazy

Try to obtain available elements without causing eager evaluation of other lazy objects. However, the implementation is allowed to do batching for efficiency. The programmer must not rely on circular side effects when the implementation is working ahead like this, or the results will be indeterminate. (However, this does not rule out pure definitional circularity; earlier array values may be used in the definition of subsequent values.)

Mostly Eager

Obtain all leading items that are not known to be infinite. The remainder of the items are left for lazy evaluation. As with the "mostly lazy" case, the programmer must not depend on circular side effects in any situation where the implementation is allowed to work ahead. Note that there are no language-defined limits on the amount of conjectural evaluation allowed, up to the size of available memory; however, an implementation may allow the arbitrary limitation of workahead by use of pragmas.

In any case there should be no profound semantic differences between "mostly lazy" and "mostly eager". These levels are primarily just hints to the implementation as to whether you are likely to want to use all the values in the list. Nothing transactional should be allowed to depend on the difference.

Strictly Eager

Obtain all items, failing immediately with an appropriate message if asked to evaluate any data structures known to be infinite. (Data structures that are effectively infinite but not provably so will likely exhaust memory instead.)

The laziness level of some common operations

List assignment; my @a = @something;

In order to provide p5-like behavior in list assignment, it is performed at the Mostly Eager level. This means that if you do

my @a = foo();

it will eagerly evaluate the return value from foo() to place elements into @a, stopping only when encountering something that is "known infinite" or upon reaching the end of the sequence.

for, map, grep, etc.

The for loop and looping routines such as map, grep, etc. are Mostly Lazy by default, but will be eager when evaluated in sink context. (Note, however, that we force for, while, until, repeat and loop loops to always be in sink context when used as a statement within a statementlist.)

Slurpy parameters

Slurpy parameters are Mostly Lazy.

Feed operators

The feed operator is strictly lazy, meaning that no operation should be performed before the user requests any element. That's how

my @a <== grep { ... } <== map { ... } <== grep { ... } <== 1, 2, 3

is completely lazy, even if 1,2,3 is a fairly small known compact list.

Common list operations

AUTHORS

Patrick R. Michaud <pmichaud@pobox.com