-
Notifications
You must be signed in to change notification settings - Fork 162
Operation "Fix Method Dispatch On Lists"
Problem: Method dispatch on lists is slow. This will attempt to discuss why, and point towards how to fix it.
Warning, this page is long, boring and very technical. Unless you want to dive inside method dispatch, I’d avoid it.
A long-standing problem in GAP is that method dispatch is slow with long lists. This attempts to summarise my understanding of the problem, and how we could fix it.
The main cost is in (I believe) KTNumHomPList
and similar methods in plist.c
, around line 520, where we have to search a list to find out if it is:
- Dense (no blanks)
- Homogeneous (everything has the same type) – and when homogeneous special case FFE or cyclic elements.
- A table (a matrix of matrices)
- Rectangular (a matrix of matrices with equal lengths of all elements)
Great efforts are made to cache these things when possible, but finding them is still expensive. Dense requires scanning the list, the others require checking the types of any recursive types.
We could try to fix this by just removing list's ability to know if they are dense, homogeneous, table and rectangular. 'table' and 'rectangular' are basically not cacheable if a list has mutable members (changing a list member of a list breaks it). We could say that mutable lists can't be table, or rectangular, but this is how we do matrices.
However, this is probably going to break a lot of stuff. I don't know if it's possible to cache dense and homogeneous better -- they are the two main ones we need, as a list must be dense and homogeneous to be a collection.
I think the main possible improvement, while maintaining full backwards compatibility, is not calculate these things when we don’t actually want them.
This will require a two-pronged attack.
-
Change method dispatch so it caches which methods actually care about these pieces of functionality, and only calculate them if we need them.
-
Remove these requirements where possible from methods.
I will now discuss these points in depth. 2 first as it is simpler.
(2) Some methods check for these things when they don’t really care. The most obvious way in which these checks arise unexpectedly is ‘IsCollection’, which only works for lists which are homogenous and dense. Having a (very quick!) look around, some uses of IsCollection could be IsCollectionOrList, which is much faster.
(1) This requires diving into InstallMethod
. it's a good idea to pop and have a look to remind yourself. Also, we can look at METHOD_1ARGS
and friends in lib/methsel1.g
to see how method dispatch executes. We need to know if a method really cares is a list is (for example) a table. I’ve been exploring this (I attach some horrible code to the bottom to explore). We have two choices:
a) Provide an explicit way to say “I don’t care about certain properties”, or "only tell me about property if it is cheap / free to find out (we already have other cases like this).
b) Check calls to InstallMethod to see if they care.
I’ve been looking at (b). let's consider the following concrete problem: Given two types T1 and T2 and a Operation Op
, would M every dispatch differently based on T1 and T2?
Things to consider:
- We have to check every implementation of
Op
, not just a first matching one, because we don't know about other arguments andTryNextMethod
exists.
So, for a single method m
which was installed for Op
, we can check if any filter of m
matches T1 and not T2 (or vice versa). Also there is an argument called famp
which is given the families. If T1 and T2 have different families, then we have to assume a difference unless famp
is ReturnTrue
, or T1 and T2 matched no filter of m
.
ForAllOp
Let's try a concrete example, and try comparing TYPE_LIST_DENSE_NHOM_MUTABLE
and TYPE_LIST_NDENSE_MUTABLE
.
These matter, because of the following implementations:
InstallMethod( ForAllOp,
"for a list, and a function",
[ IsList and IsFinite, IsFunction ],
function ( C, func )
local elm;
for elm in [1..Length(C)] do
if IsBound(C[elm]) then
if not func( C[elm] ) then
return false;
fi;
fi;
od;
return true;
end );
InstallMethod( ForAllOp,
"for a dense list, and a function",
[ IsDenseList and IsFinite, IsFunction ],
function ( C, func )
local elm;
for elm in [1..Length(C)] do
if not func( C[elm] ) then
return false;
fi;
od;
return true;
end );
This would be an ideal place for a way to say "If you know the list is dense I'll use that, but don't bother going to figure it out just for me". Also, ForAllOp
doesn't care about the difference between TYPE_LIST_DENSE_NHOM_MUTABLE
and TYPE_LIST_GF2MAT_IMM
, so that's gaining homogeneous (GF2), immutable and matrix. Just checking for dense is actually cheap (just scan list for 0s), so even if we only detected that, that would be very beneficial.
EQ
EQ fails because of:
InstallMethod( EQ,
"for two small lists",
IsIdenticalObj,
[ IsList and IsSmallList, IsList and IsSmallList ],
EQ_LIST_LIST_DEFAULT );
The problem is IsIdenticalObj
. This is a sensible optimisation, which tries to check if the two objects come from the same family before we both checking them for equality. However, actually this is a bad idea for several reasons.
- If we just headed into EQ_LIST_LIST_DEFAULT, we could check families there, return
False
, and stop method dispatch. - Checking
IsIdenticalObj
for the families requires (possibly) scanning the whole of both lists!)
We could add a CheapCheckIdenticalFamily
(bad name) which would try to reject if the families are equal, but still possibly call the method (so it would have to be written to support that, which EQ_LIST_LIST_DEFAULT
is).
The following is some code to allow exploring which methods work with which types.
# Get a list of all operations
allops := OPERATIONS{[1,3 .. Length(OPERATIONS)-1]};;
# Based on pulling apart METHOD_1ARGS and friends in methsel1.g
OperationDiffersOnTypesFixedArity := function(operation, arity, typelist)
local methods, i, allEqual, x, typelistMatch, allFail;
# Helper function
allEqual := l -> (Size(l) < 2 or ForAll([2..Length(l)], x -> (l[1] = l[x])));
# Get all implementations of this operation
methods := METHODS_OPERATION( operation, arity );
# this is how we loop over them
for i in [1, 5+arity .. LEN_LIST(methods)-3-arity] do
# Check if we can feed lists to any argument to this filter
allFail := true;
for x in [1..arity] do
# Match typelist against the filter
typelistMatch := List(typelist, y -> IS_SUBSET_FLAGS(y![2], methods[i+x]));
# Check if everything failed
allFail := allFail and not(ForAny(typelistMatch, x -> x));
# Check if everything was equal
if not allEqual(typelistMatch) then
Print(operation, " Flag variant: arity:", arity, " for ", methods[i+arity+3], " types: ", typelistMatch, " arg: ", i,":",x, "\n");
return true;
fi;
od;
# This is a function we can't read, so if it's anything other than TRUE we must give up
# unless everything failed every filter
if methods[i] <> RETURN_TRUE and not allFail then
Print(operation, " arity:",arity, " for ", methods[i+arity+3], " has a 'famp' which is not 'ReturnTrue':", i, "\n");
return true;
fi;
od;
return false;
end;
# Check all arities of an operation
OperationDiffersOnTypes := function(operation, typelist)
return ForAny([1..6], arity -> OperationDiffersOnTypesFixedArity(operation, arity, typelist));
end;
## Incomplete list of list types
# These are cheap to test
emptyListType := [
TYPE_LIST_EMPTY_IMMUTABLE
, TYPE_LIST_EMPTY_MUTABLE
];;
mutableListTypes := [
TYPE_LIST_DENSE_NHOM_MUTABLE
, TYPE_LIST_DENSE_NHOM_NSORT_MUTABLE
, TYPE_LIST_DENSE_NHOM_SSORT_MUTABLE
, TYPE_LIST_NDENSE_MUTABLE
, TYPE_LIST_GF2MAT
, TYPE_LIST_GF2VEC
];;
immutableListTypes := [
TYPE_LIST_DENSE_NHOM_IMMUTABLE
, TYPE_LIST_DENSE_NHOM_NSORT_IMMUTABLE
, TYPE_LIST_DENSE_NHOM_SSORT_IMMUTABLE
, TYPE_LIST_GF2MAT_IMM
, TYPE_LIST_GF2VEC_IMM
, TYPE_LIST_NDENSE_IMMUTABLE
, TYPE_LIST_GF2VEC_IMM_LOCKED
, TYPE_LIST_GF2VEC_LOCKED
];;
# Get all methods which care about lists in general
mutl := Filtered(allops, x -> OperationDiffersOnTypes(x, mutableListTypes));;
immutl := Filtered(allops, x -> OperationDiffersOnTypes(x, immutableListTypes));;