krust
is a k-mer counter--a bioinformatics 101 tool for counting the frequency of substrings of length k
within strings of DNA data. It's written in Rust and run from the command line. It takes a fasta file of DNA sequences and will output all canonical k-mers (the double helix means each k-mer has a reverse complement) and their frequency across all records in the given fasta file.
Run krust
on the test data* in the krust
Github repo, searching for kmers of length 5, like this:
cargo run --release 5 your/local/path/to/cerevisae.pan.fa > output.tsv
or, searching for kmers of length 21:
cargo run --release 21 your/local/path/to/cerevisae.pan.fa > output.tsv
krust
prints to stdout
, writing, on alternate lines:
>{frequency}
{canonical k-mer}
>{frequency}
{canonical k-mer}
...
krust
uses the rust-bio
, rayon
, and dashmap
Rust libraries.
*Unusual, yes, to provide this data in the repo, but it's helped me spread word about what I'm doing.