-
Notifications
You must be signed in to change notification settings - Fork 19
SpatialConditions
Table of contents:
- Introduction
- Uses for spatial conditions
- Some details
- Evaluation
-
Functions
-
Available functions
- sp_annulus
- sp_block
- sp_circle
- sp_circle_cell
- sp_ellipse
- sp_get_spatial_output_list_value
- sp_group_not_empty
- sp_in_label_range
- sp_in_line_with
- sp_is_left_of
- sp_is_right_of
- sp_match_regex
- sp_match_text
- sp_point_in_cluster
- sp_point_in_poly
- sp_point_in_poly_shape
- sp_points_in_same_cluster
- sp_points_in_same_poly_shape
- sp_rectangle
- sp_redundancy_greater_than
- sp_richness_greater_than
- sp_select_all
- sp_select_block
- sp_select_element
- sp_select_sequence
- sp_self_only
- sp_spatial_output_passed_defq
- sp_square
- sp_square_cell
-
Available functions
- Variables
- Declaring variables and using more complex functions
Spatial conditions are core to the Biodiverse system. They are used to specify both neighbourhoods used in the analyses, and also the definition queries used to restrict the calculations to a subset of groups.
Spatial conditions are used both to define the neighbourhoods of the spatial analyses and the definition queries used to constrain the set of groups used in the analyses.
Neighbourhoods are essential for any spatial analysis, as it is through these that one can define the set of groups to be considered in an analysis. In the moving window analyses these determine which groups are compared with which other groups. In the cluster analyses they determine which groups are considered candidates to be clustered together. It is also possible to define neighbourhoods for spatially constrained randomisations (see Laffan and Crisp, 2003, J Biogeog), although this is yet to be implemented (see issue #76).
Before we describe the process, some definitions are needed. The processing group is the group being considered in the analysis at some iteration, and to which the results for that iteration are assigned. A group is a member of the processing group's set of neighbours (is a neighbouring group) if the spatial condition evaluates to true.
A spatial analysis progressively iterates over each group that passes the definition query, assessing every other group for membership in neighbour set 1 or 2. The selected indices are then calculated using the groups that occur in neighbour sets 1 and 2 (and their labels and other properties as required by the calculations).
These differ from neighbourhoods as they assess only the processing group to determine if calculations should be run for it or not. They use the same syntax as for neighbourhoods, but the system will commonly complain if a condition requiring a neighbouring group is used.
Note that groups that fail the definition query are still considered for membership of neighbour sets of those that pass. Use a definition query in conjunction with an appropriate neighbourhood definition if you want to exclude some groups from neighbour sets as well. For example, you could use sp_match_text (axis => 2, text => 'fred', type => 'proc')
for the definition query and sp_match_text (axis => 2, text => 'fred', type => 'nbr')
for the neighbourhood. This will restrict calculations to those groups with a third axis of 'fred', and also exclude any group without fred in the third axis from the neighbour sets of those groups processed.
As with any system, there must be compromises between ease of use and system flexibility. In this case we have opted for system flexibility by direct use of Perl syntax. This means you can use arbitrarily complex functions to define neighbourhoods, including loops and other multiple variable conditions. This may be horrifying to non-perlers, as one of the main complaints about perl is its complex grammar and syntax. To alleviate this we are encapsulating many of the common conditions in subroutines that can be called by name with a set of arguments. We have also provided examples below to assist.
The neighbourhood and definition query interfaces have a syntax verification button to check that the syntax is valid. This does not, however, guarantee your parameters will work, only that it is valid Perl code. (The reality here is that we will just evaluate the parameter statement with some default values and warn you if the system raises some sort of error or exception).
If you are using a locale where the radix character (the decimal place marker) is a comma then you need to remember to use a dot instead. For example, this code sp_circle (radius => 5,3)
should be sp_circle (radius => 5.3)
or you will receive warnings about uneven arguments used to define a hash.
This is a brief description of the evaluation process used to determine the set of neighbours for a group.
Currently the system operates on boolean membership of the set of neighbours, so a group either is or is not a neighbour of the processing group. If no spatial index is used then every group's membership of the processing cell's neighbour set is considered in turn. If a spatial index is used then only a subset of neighbours is considered (those within the relevant spatial index blocks). This is why processing times are usually shorter when using an index (see KeyConcepts#Using_the_spatial_index).
Spatial conditions need not return symmetric sets. In this way group i can be in group j's neighbour set, but j need not be in i's neighbour set. This is not an issue for moving window analyses, but can cause asymmetric dissimilarity matrices if used to run a spatially constrained cluster analysis. This is why it is generally a good idea in these cases to set the second neighbourhood to be sp_select_all()
or 1
(which is the same thing).
In the calculations, groups in neighbour set 1 are excluded from neighbour set 2 so there are no overlaps that would violate the comparison calculations.
The conditions are specified using some combination of pre-defined functions, pre-specified variables, and/or user defined variables and functions. These are now described.
Functions are the easiest way to specify conditions as one does not need to wrestle with variables. Functions also set metadata to tell the system how to use the spatial index. The spatial index saves considerable processing time for large data sets as the system does not need to test many pairs of index blocks to determine which to use (see KeyConcepts#Using_the_spatial_index). If you use a function for which an index will produce erroneous results then the system sets a flag to ignore it.
The available functions in version 4.0 are: sp_annulus, sp_block, sp_circle, sp_circle_cell, sp_ellipse, sp_get_spatial_output_list_value, sp_group_not_empty, sp_in_label_range, sp_in_line_with, sp_is_left_of, sp_is_right_of, sp_match_regex, sp_match_text, sp_point_in_cluster, sp_point_in_poly, sp_point_in_poly_shape, sp_points_in_same_cluster, sp_points_in_same_poly_shape, sp_rectangle, sp_redundancy_greater_than, sp_richness_greater_than, sp_select_all, sp_select_block, sp_select_element, sp_select_sequence, sp_self_only, sp_spatial_output_passed_defq, sp_square, sp_square_cell,
An annulus. Assessed against all dimensions by default but use the optional "axes => []" arg to specify a subset. Uses group (map) distances.
Required args: inner_radius, outer_radius
Optional args: axes
Example:
# an annulus assessed against all axes
sp_annulus (inner_radius => 2000000, outer_radius => 4000000)
# an annulus assessed against axes 0 and 1
sp_annulus (inner_radius => 2000000, outer_radius => 4000000, axes => [0,1])
A non-overlapping block. Set an axis to undef to ignore it.
Required args: size
Optional args: origin
Example:
sp_block (size => 3)
sp_block (size => [3,undef,5]) # rectangular block, ignores second axis
A circle. Assessed against all dimensions by default (more properly called a hypersphere) but use the optional "axes => []" arg to specify a subset. Uses group (map) distances.
Required args: radius
Optional args: axes
Example:
# A circle of radius 1000 across all axes
sp_circle (radius => 1000)
# use only axes 0 and 3
sp_circle (radius => 1000, axes => [0, 3])
A circle. Assessed against all dimensions by default (more properly called a hypersphere) but use the optional "axes => []" arg to specify a subset. Uses cell distances.
Required args: radius
Optional args: none
Example:
# A circle of radius 3 cells across all axes
sp_circle (radius => 3)
# use only axes 0 and 3
sp_circle_cell (radius => 3, axes => [0, 3])
A two dimensional ellipse. Use the 'axes' argument to control which are used (default is [0,1]). The default rotate_angle is 0, such that the major axis is east-west.
Required args: major_radius, minor_radius
Optional args: axes, rotate_angle, rotate_angle_deg
Example:
# North-south aligned ellipse
sp_ellipse (
major_radius => 300000,
minor_radius => 100000,
axes => [0,1],
rotate_angle => 1.5714,
)
Obtain a value from a list in a previously calculated spatial output.
Required args: index, output
Optional args: element, list, no_error_if_no_index
Example:
# Get the spatial results value for the current neighbour group
# (or processing group if used as a def query)
sp_get_spatial_output_list_value (
output => 'sp1', # using spatial output called sp1
list => 'SPATIAL_RESULTS', # from the SPATIAL_RESULTS list
index => 'PE_WE_P', # get index value for PE_WE_P
)
# Get the spatial results value for group 128:254
# Note that the SPATIAL_OUTPUTS list is assumed if
# no 'list' arg is passed.
sp_get_spatial_output_list_value (
output => 'sp1',
element => '128:254',
index => 'PE_WE_P',
)
Is a basedata group non-empty? (i.e. contains one or more labels)
Required args: none
Optional args: element
Example:
# Restrict calculations to those non-empty groups.
# Will use the processing group if a def query,
# the neighbour group otherwise.
sp_group_not_empty ()
# The same as above, but being specific about which group (element) to test.
# This is probably best used in cases where the element
# to check is varied spatially.}
sp_group_not_empty (element => '5467:9876')
Is a group within a label's range?
Required args: label
Optional args: type
Example:
# Are we in the range of label called Genus:Sp1?
sp_in_label_range(label => 'Genus:Sp1')
# The type argument determines if the
processing or neighbour group is assessed
Are we in line with a vector radiating out from the processing cell? Use the 'axes' argument to control which are used (default is [0,1])
Required args: none
Optional args: axes, vector_angle, vector_angle_deg
Example:
sp_in_line_with (vector_angle => Math::Trig::pip2) # pi/2 = 90 degree angle
Are we to the left of a vector radiating out from the processing cell? Use the 'axes' argument to control which are used (default is [0,1])
Required args: none
Optional args: axes, vector_angle, vector_angle_deg
Example:
sp_is_left_of (vector_angle => 1.5714)
Are we to the right of a vector radiating out from the processing cell? Use the 'axes' argument to control which are used (default is [0,1])
Required args: none
Optional args: axes, vector_angle, vector_angle_deg
Example:
sp_is_right_of (vector_angle => 1.5714)
Select all neighbours with an axis matching a regular expresion
Required args: re
Optional args: axis, type
Example:
# use any neighbour where the first axis includes the text "type1"
sp_match_regex (re => qr'type1', axis => 0, type => 'nbr')
# match only when the third neighbour axis starts with
# the processing group's second axis
sp_match_regex (re => qr/^$coord[2]/, axis => 2, type => 'nbr')
# match the whole coordinate ID (element name)
# where Biome can be 1 or 2 and the rest of the name contains "dry"
sp_match_regex (re => qr/^Biome[12]:.+dry/)
# Set a definition query to only use groups where the
# third axis ends in 'park' (case insensitive)
sp_match_regex (text => qr{park$}i, axis => 2, type => 'proc')
Select all neighbours matching a text string
Required args: text
Optional args: axis, type
Example:
# use any neighbour where the first axis has value of "type1"
sp_match_text (text => 'type1', axis => 0, type => 'nbr')
# match only when the third neighbour axis is the same
# as the processing group's second axis
sp_match_text (text => $coord[2], axis => 2, type => 'nbr')
# match where the whole coordinate ID (element name)
# is 'Biome1:savannah forest'
sp_match_text (text => 'Biome1:savannah forest')
# Set a definition query to only use groups with 'NK' in the third axis
sp_match_text (text => 'NK', axis => 2, type => 'proc')
Returns true when the group is in a cluster or region grower output cluster.
Required args: output
Optional args: element, from_node
Example:
# Use any element that is a terminal in the cluster output.
# This is useful if the cluster analysis was run under
# a definition query and you want the same set of groups.
sp_point_in_cluster (
output => "some_cluster_output",
)
# Now specify a cluster within the output
sp_point_in_cluster (
output => "some_cluster_output",
from_node => '118___', # use the node's name
)
# Specify an element to check instead of the current
# processing element.
sp_point_in_cluster (
output => "some_cluster_output",
from_node => '118___', # use the node's name
element => '123:456', # specify an element to check
)
Select groups that occur within a user-defined polygon (see sp_point_in_poly_shape for an alternative)
Required args: polygon
Optional args: point
Example:
# Is the neighbour coord in a square polygon?
sp_point_in_poly (
polygon => [[0,0],[0,1],[1,1],[1,0],[0,0]],
point => \@nbrcoord,
)
Select groups that occur within a polygon or polygons extracted from a shapefile
Required args: file
Optional args: axes, field_name, field_val, no_cache, point
Example:
# Is the neighbour coord in a shapefile?
sp_point_in_poly_shape (
file => 'c:\biodiverse\data\coastline_lamberts',
point => \@nbrcoord,
);
# Is the neighbour coord in a shapefile's second polygon (counting from 1)?
sp_point_in_poly_shape (
file => 'c:\biodiverse\data\coastline_lamberts',
field_val => 2,
point => \@nbrcoord,
);
# Is the neighbour coord in a polygon with value 2 in the OBJECT_ID field?
sp_point_in_poly_shape (
file => 'c:\biodiverse\data\coastline_lamberts',
field_name => 'OBJECT_ID',
field_val => 2,
point => \@nbrcoord,
);
Returns true when two points are within the same cluster or region grower group, or if neither point is in the selected clusters/groups.
Required args: output
Optional args: from_node, group_by_depth, num_clusters, target_distance
Example:
# Try to use the highest four clusters from the root.
# Note that the next highest number will be used
# if four is not possible, e.g. there are five
# siblings below the root. Fewer will be returned
# if the tree has insufficient tips.
sp_points_in_same_cluster (
output => "some_cluster_output",
num_clusters => 4,
)
# Cut the tree at a distance of 0.25 from the tips
sp_points_in_same_cluster (
output => "some_cluster_output",
target_distance => 0.25,
)
# Cut the tree at a depth of 3.
# The root is depth 1.
sp_points_in_same_cluster (
output => "some_cluster_output",
target_distance => 3,
group_by_depth => 1,
)
# work from an arbitrary node
sp_points_in_same_cluster (
output => "some_cluster_output",
num_clusters => 4,
from_node => '118___', # use the node's name
)
# target_distance is ignored if num_clusters is set
sp_points_in_same_cluster (
output => "some_cluster_output",
num_clusters => 4,
target_distance => 0.25,
)
Returns true when two points are within the same shapefile polygon
Required args: file
Optional args: axes, no_cache, point1, point2
Example:
# define neighbour sets using a shapefile
sp_points_in_same_poly_shape (file => 'path/to/a/shapefile')
# return true when the neighbour coord is in the same
# polygon as an arbitrary point
sp_points_in_same_poly_shape (
file => 'path/to/a/shapefile',
point1 => [10,20],
)
# reverse the axes
sp_points_in_same_poly_shape (
file => 'path/to/a/shapefile',
axes => [1,0],
)
# compare against the second and third axes of your data
# e.g. maybe you have time as the first basedata axis
sp_points_in_same_poly_shape (
file => 'path/to/a/shapefile',
axes => [1,2],
)
A rectangle. Assessed against all dimensions by default (more properly called a hyperbox) but use the optional "axes => []" arg to specify a subset. Uses group (map) distances.
Required args: sizes
Optional args: axes
Example:
# A rectangle of equal size on the first two axes,
# and 100 on the third.
sp_rectangle (sizes => [100000, 100000, 100])
# The same, but with the axes reordered
# (an example of using the axes argument)
sp_rectangle (
sizes => [100000, 100, 100000],
axes => [0, 2, 1],
)
# Use only the first an third axes
sp_rectangle (sizes => [100000, 100000], axes => [0,2])
Return true if the sample redundancy for an element is greater than the threshold.
Required args: threshold
Optional args: element
Example:
# Uses the processing group for definition queries,
# and the neighbour group for spatial conditions.
# In this example, # any group with a redundncy
# score of 0.5 or fewerless will return false
sp_redundancy_greater_than (
threshold => 0.5,
)
sp_redundancy_greater_than (
element => '128:254', # an arbitrary element
threshold => 0.2, # with a threshold of 0.2
)
Return true if the richness for an element is greater than the threshold.
Required args: threshold
Optional args: element
Example:
# Uses the processing group for definition queries,
# and the neigbour group for spatial conditions.
sp_richness_greater_than (
threshold => 3, # any group with 3 or fewer labels will return false
)
sp_richness_greater_than (
element => '128:254', # an arbitrary element
threshold => 4, # with a threshold of 4
)
Select all elements as neighbours
Required args: none
Optional args: none
Example:
sp_select_all() # select every group
Select a subset of all available neighbours based on a block sample sequence
Required args: size
Optional args: count, prng_seed, random, reverse_order, use_cache
Example:
# Select up to two groups per block with each block being 5 groups
on a side where the group size is 100
sp_select_block (size => 500, count => 2)
# Now do it non-randomly and start from the lower right
sp_select_block (size => 500, count => 10, random => 0, reverse => 1)
# Rectangular block with user specified PRNG starting seed
sp_select_block (size => [300, 500], count => 1, prng_seed => 454678)
# Lower memory footprint (but longer running times for neighbour searches)
sp_select_block (size => 500, count => 2, clear_cache => 1)
Select a specific element. Basically the same as sp_match_text, but with optimisations enabled
Required args: element
Optional args: type
Example:
# match where the whole coordinate ID (element name)
# is 'Biome1:savannah forest'
sp_select_element (element => 'Biome1:savannah forest')
Select a subset of all available neighbours based on a sample sequence (note that groups are sorted south-west to north-east)
Required args: frequency
Optional args: cycle_offset, first_offset, reverse_order, use_cache
Example:
# Select every tenth group (groups are sorted alphabetically)
sp_select_sequence (frequency => 10)
# Select every tenth group, starting from the third
sp_select_sequence (frequency => 10, first_offset => 2)
# Select every tenth group, starting from the third last
# and working backwards
sp_select_sequence (
frequency => 10,
first_offset => 2,
reverse_order => 1,
)
Select only the processing group
Required args: none
Optional args: none
Example:
sp_self_only() # only use the proceessing cell
Returns 1 if an element passed the definition query for a previously calculated spatial output
Required args: none
Optional args: element, output
Example:
# Used for spatial or cluster type analyses:
# The simplest case is where the current
# analysis includes a def query and you
# want to use it in a spatial condition.
sp_spatial_output_passed_defq();
# Using another output in this basedata
# In this case the output is called 'analysis1'
sp_spatial_output_passed_defq(
output => 'analysis1',
);
# Return true if a specific element passed the def query
sp_spatial_output_passed_defq(
element => '153.5:-32.5',
);
An overlapping square assessed against all dimensions (more properly called a hypercube). Uses group (map) distances.
Required args: size
Optional args: none
Example:
# An overlapping square, cube or hypercube
# depending on the number of axes
# Note - you cannot yet specify which axes to use
# so it will be square on all sides
sp_square (size => 300000)
A square assessed against all dimensions (more properly called a hypercube). Uses 'cell' distances.
Required args: size
Optional args: none
Example:
sp_square_cell (size => 3)
There are several different sets of variables implemented that the system recognises. Any undeclared variable you use that does not occur in this list will be treated as a zero or as undefined (depending on where it is used), which means it will probably not behave as you expect. An example declaring variables is given below.
As a general rule, uppercase letters denote absolute values, lower case letters denote signed values (positive or negative). Positive values are north, east, above, or to the right. Negative values are south, west, below or to the left.
$D
is the absolute euclidean distance from the processing group to a candidate neighbour group across all dimensions.
$D[0]
, $D[1]
are the absolute euclidean distances in dimension 0 and 1. In most cases $D[0]
will be the X (longitude) dimension, $D[1]
will be the Y (latitude) dimension. The library functions can actually handle more dimensions than this (eg $D[2]
for altitude or depth), but the GUI is not set up to display them (it will plot the data using the first two axes, so only the first of any overlapping groups will be visible).
$d[0]
, $d[1]
and so forth are the signed euclidean distance in dimension 0, 1 etc. This allows us to extract all groups within some distance in some direction. As with standard Cartesian plots, negative values are to the left or below (west or south), positive values to the right or above (east or north). As with $D[0]
, $d[0]
will normally be the X dimension, $d[1]
will be the Y dimension.
Note that using abs($d[1])
is the same as using $D[1]
.
$C
, $C[0]
, $C[1]
, $c[0]
, $c[1]
are the same as the euclidean distance variables ($D
etc) but operate directly in group (cell) units. If your groups were imported using a cellsize of 100,000, then $D[1] < 100000
is the same as $C[1] < 1
. Note, however, that if you used a different resolution in each dimension, then the map and cell distances are not directly comparable. For example, if cell sizes of 100 and 200 were used for axes 0 and 1 then $C<1
is the same as sqrt($C[0]**2 + $C[1]**2) < 1
which is sqrt(($D[0]/100)**2 + ($D[1]/200)**2) < 1
, and not $D<100
.
$coord_id1
is the name of the processing coord, $coord_id2
is the name of the neighbour coord. (Available from version 0.16).
$coord[0]
, $coord[1]
are the coordinate values of the processing group in the first and second dimensions. As per the above, think of these as X and Y, except that $coord[5]
will also work if your groups have six or more dimensions. Note that the $coord[]
variables do not necessarily work properly with the spatial index, so you might need to turn the index off when using them.
$nbrcoord[0]
etc are analogous to $coord[0]
etc, except that they are the coordinates for the current neighbour group.
Non-programmers need to note that the array index starts from zero, so $coord[1]
is the second coordinate axis and not the first. This differs from systems like R and AWK, but is consistent with many other programming languages like C and Python.
- Set the neighbours to be those groups where the absolute distance from the processing group is less than 100,000.
$D <= 100000
- Select all groups to the west of the processing group.
$d[0] < 0
- Select all groups to the north-east of the processing group.
$d[0] > 0 && $d[1] > 0
- The absolute distance in the first (eg x) dimension is less than 100,000 AND the signed distance is greater than 100,000. This will result in a neighbourhood that is a column of groups 200,000 map units east-west, and including all groups 100,000 map units north of the processing group. Not that you would normally want a neighbourhood like this...
$D[0] <= 100000 && $d[1] >= 100000
- Select everything north of 6000000 (e.g. if using UTM coordinates as axes 0 and 1). This is an example that could be used as a definition query, and will not work well as a neighbourhood (use
$nbr_y
instead of$y
for that).
$y > 6000000
- Select everything within a rectangle. This is another useful definition query.
$y > 6000000 && $y <= 6100000 && $x > 580000 && $x <= 600000
- Select a specific processing coord (
495:595
), useful as a definition query to use only one group. Note the use of theeq
operator - this matches text. (Not available before version 0.16)
$coord_id1 eq '495:595'
Variable declaration is done as per Perl syntax. For example:
my $some_var = 10;
return ($D / $some_var) <= 100;
This trivial example evaluates to true if the absolute distance divided by 10 (the value in variable $some_var
) is less than 100. The semicolon denotes a separation of statements to be processed in sequence, such that this example could be written on one line. The result of the last statement is what is returned to the analysis to determine if the group is part of the neighbourhood or not. It is evaluated as true or false. The word return
is not actually needed in this case, but does make things clearer when there are multiple lines of code.
A more complex function might involve an ellipse (although you could just use sp_ellipse (major_radius => 300000, minor_radius => 100000, rotate_angle => 1.5714)
)
my $major_radius = 300000; # longest axis
my $minor_radius = 100000; # shortest axis
# set the offset in radians, anticlockwise (1.5714 = PI/2 = north)
my $rotate_angle = 1.5714;
# now calc the bearing to rotate the coords by
my $bearing = atan2 ($d[0], $d[1]) + $rotate_angle;
# and rotate them
my $r_x = cos ($bearing) * $D; # rotated x coord
my $r_y = sin ($bearing) * $D; # rotated y coord
# get the scaled distances in each direction
my $a_dist = ($r_y ** 2) / ($major_radius ** 2);
my $b_dist = ($r_x ** 2) / ($minor_radius ** 2);
# this last line evaluates to 1 (true) if the candidate
# neighbour is within or on the edge of the ellipse,
# and 0 (false) otherwise
return ($a_dist + $b_dist) <= 1;
Note the use of the word my
. This is required to declare your own variables in the correct scope. If it is not used then the variables will not work properly. Do not declare any of the pre-calculated Biodiverse variables with this ($D
etc) - they already exist and redeclaring will overprint them, causing unpredictable results.
If you wish to know if your function works with the spatial index then run a moving window analysis twice, once with and once without the spatial index, using the list indices generated by the Element Counts calculations to get the lists of neighbours. Export the results to CSV and use a difference tool to compare the results.
Functions available by default are those in the Math::Trig library, plus POSIX::fmod().
To access environment variables you have set, just use $ENV{variable_name}
, eg $ENV{my_default_radius}
.