-
Notifications
You must be signed in to change notification settings - Fork 8
k mers hashes rescue
Téo Lemane edited this page Oct 22, 2021
·
4 revisions
In kmtricks, k-mer filtering is achieved by leverage k-mer abundances across samples. The following parameters can modulate this procedure.
-
--hard-min INT
: All k-mers with an abundance less than this parameter are discarded. -
--soft-min INT/STR/FLOAT
: All k-mers with an abundance betweencount-abundance-min
andmerge-abundance-min
are considering rescue-able. You can provide a path of a file containing one threshold per line, with the same order as in the input fof. You can also use a float as input. In this case, one specific threshold T per sample is computed such that the number of k-mers occurring T times is smaller than VALUE x nb_kmers. -
--share-min INT
: If a k-mer is rescue-able, it is conserved if it is solid (with an abundance greater thansoft-min
) in at leastsave-if
other sample(s). -
--recurrence-min INT
: All k-mers that do not occur in at leastrecurrence-min
sample(s) are discarded.
The figure below shows an example of the rescue procedure using sample-specific soft-min
and the following parameters: hard-min 1
, share-min 3
and recurrence-min 2
.
-
H1 has a abundance lower than 3 in D0 but it is solid in at least
share-min
samples (D2, D3, D4). It is then conserved in D0 (right part of the figure). - H2 is non-solid in D1, D3 and D4 and is solid only in 2 samples. H2 is therefore discarded in D1, D3 and D4.
-
H3 is solid only in one sample. Hence, as
recurrence-min
cannot be satisfied, the whole row is discarded (dash signs in the Figure, or corresponds to the null bit-vector in hash mode).