Given a set
A minimal (
>original_header 20 6.13
TGGATAAAAAGGCTGACGAAAGGTCTAGCTAAAATTGTCAGGTGCTCTCAGATAAAGCAGTAAGCGAGTTGGTGTTCGCTGAGCGTCGACTAGGCAACGTTAAAGCTATTTTAGGC...
In this case 20 kmers are shared with the indexed kmers. This represents 6.13% of the kmers in the sequence.
Please see https://b2s-doc.readthedocs.io/en/latest/usage.html#installation
back_to_sequences --in-kmers kmers.fasta --in-sequences reads.fasta --out-sequences filtered_reads.fasta --out-kmers counted_kmers.txt
The filtered_reads.fasta
file contains the original sequences (here reads) from reads.fasta
that contain at least one of the kmers from kmers.fasta
. The headers of each read is the same as in reads.fasta
, plus the estimated ratio of shared kmers and number of shared kmers.
As the --out-kmers option is used, the file counted_kmers.txt
contains for each kmer in kmers.fasta
the number of times it was found in filtered_reads.fasta
.
Example results obtained on
- the GenOuest platform on a node with 32 threads Xeon 2.2 GHz, denoted by "genouest" in the table below.
- a MacBook, Apple M2 pro, 16 GB RAM, with 10 threads, denoted by "mac" in the table below.
- AMD Ryzen 7 4.2 GHz 5800X 64 GB RAM, with 16 threads, denoted by "AMD" in the table below.
Indexed: one million kmers eacho of length 31. We queried: from 10,000 reads to 200 million reads each of length 100.
Number of reads | Time genouest | Time mac | Time AMD | max RAM |
---|---|---|---|---|
10,000 | 0.7s | 0.54s | 0.4s | 0.13 GB |
100,000 | 0.8s | 0.8s | 1.2s | 0.13 GB |
1,000,000 | 2.0s | 3.5s | 7.1s | 0.13 GB |
10,000,000 | 7.1s | 11s | 16s | 0.13 GB |
100,000,000 | 47s | 58s | 48s | 0.13 GB |
200,000,000 | 1m32s | 1m52s | 1m44 | 0.13 GB |
See this page for details
Please reafer the specific documentation for
Please check out How to contribute
Baire et al., (2024). Back to sequences: Find the origin of k-mers. Journal of Open Source Software, 9(101), 7066, https://doi.org/10.21105/joss.07066
Full documentation is available at https://b2s-doc.readthedocs.io/en/latest/