Skip to content

Latest commit

 

History

History
215 lines (145 loc) · 5.39 KB

reference.rst

File metadata and controls

215 lines (145 loc) · 5.39 KB

Reference Guide

.. module:: loopy
.. moduleauthor:: Andreas Kloeckner <inform@tiker.net>

This guide defines all functionality exposed by loopy. If you would like a more gentle introduction, you may consider reading the example-based guide :ref:`guide` instead.

Inames

Loops are (by default) entered exactly once. This is necessary to preserve depdency semantics--otherwise e.g. a fetch could happen inside one loop nest, and then the instruction using that fetch could be inside a wholly different loop nest.

Integer Domain

Expressions

  • if
  • reductions
    • duplication of reduction inames
  • complex-valued arithmetic
  • tagging of array access and substitution rule use ("$")

Assignments and Substitution Rules

Syntax of an instruction:

label: [i,j|k,l] <float32> lhs[i,j,k] = EXPRESSION : dep_label, dep_label_2

The above example illustrates all the parts that are allowed in loo.py's instruction syntax. All of these except for lhs and EXPRESSION are optional.

  • label is a unique identifier for this instruction, enabling you to refer back to the instruction uniquely during further transformation as well as specifying ordering dependencies.
  • dep_label,dep_label_2 are dependencies of the current instruction. Loo.py will enforce that the instructions marked with these labels are scheduled before this instruction.
  • <float32> declares lhs as a temporary variable, with shape given by the ranges of the lhs subscripts. (Note that in this case, the lhs subscripts must be pure inames, not expressions, for now.) Instead of a concrete type, an empty set of angle brackets <> may be given to indicate that type inference should figure out the type of the temporary.
  • [i,j|k,l] specifies the inames within which this instruction is run. Independent copies of the inames k and l will be made for this instruction.

Syntax of an substitution rule:

rule_name(arg1, arg2) := EXPRESSION

Tags

Tag Meaning
None | "for" Sequential loop
"l.N" Local (intra-group) axis N
"l.auto" Automatically chosen local (intra-group) axis
"g.N" Group-number axis N
"unr" Plain unrolling
"ilp" | "ilp.unr" Unroll using instruction-level parallelism
"ilp.seq" Realize parallel iname as innermost loop

(Throughout this table, N must be replaced by an actual number.)

"ILP" does three things:

  • Restricts loops to be innermost
  • Duplicates reduction storage for any reductions nested around ILP usage
  • Causes a loop (unrolled or not) to be opened/generated for each involved instruction

Automatic Axis Assignment

Automatic local axes are chosen as follows:

  1. For each instruction containing "l.auto" inames:
    1. Find the lowest-numbered unused axis. If none exists,
      use sequential unrolling instead.
    2. Find the iname that has the smallest stride in any global
      array access occurring in the instruction.
    3. Assign the low-stride iname to the available axis, splitting
      the iname if it is too long for the available axis size.

If you need different behavior, use :func:`tag_dimensions` and :func:`split_dimension` to change the assignment of "l.auto" axes manually.

Creating Kernels

Arguments

.. autoclass:: ScalarArg
    :members:
    :undoc-members:

.. autoclass:: GlobalArg
    :members:
    :undoc-members:

.. autoclass:: ConstantArg
    :members:
    :undoc-members:

.. autoclass:: ImageArg
    :members:
    :undoc-members:

String Syntax

  • Substitution rules
  • Instructions

Kernels

.. autoclass:: LoopKernel

Do not create :class:`LoopKernel` objects directly. Instead, use the following function, which takes the same arguments, but does some extra post-processing.

.. autofunction:: make_kernel

Wrangling dimensions

.. autofunction:: split_dimension

.. autofunction:: join_dimensions

.. autofunction:: tag_dimensions

Dealing with Substitution Rules

.. autofunction:: extract_subst

.. autofunction:: expand_subst

Precomputation and Prefetching

.. autofunction:: precompute

.. autofunction:: add_prefetch

    Uses :func:`extract_subst` and :func:`precompute`.

Manipulating Reductions

.. autofunction:: realize_reduction

Finishing up

.. autofunction:: generate_loop_schedules

.. autofunction:: check_kernels

.. autofunction:: generate_code

Automatic Testing

.. autofunction:: auto_test_vs_ref

Troubleshooting

Printing :class:`LoopKernel` objects

If you're confused about things loopy is referring to in an error message or about the current state of the :class:`LoopKernel` you are transforming, the following always works:

print kernel

(And it yields a human-readable--albeit terse--representation of kernel.)

.. autofunction:: preprocess_kernel

.. autofunction:: get_dot_dependency_graph