Skip to content

Bioinformatics 101 tool for counting unique k-length substrings in DNA

License

Notifications You must be signed in to change notification settings

suchapalaver/krust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

krust

krust is a k-mer counter--a bioinformatics 101 tool for counting the frequency of substrings of length k within strings of DNA data. It's written in Rust and run from the command line. It takes a fasta file of DNA sequences and will output all canonical k-mers (the double helix means each k-mer has a reverse complement) and their frequency across all records in the given fasta file.

krust supports either rust-bio, by default, or needletail, with any additional command line argument, for FASTA reading.

Run krust with rust-bio's FASTA reader to count 5-mers like this:

cargo run --release 5 your/local/path/to/fasta_data.fa > output.tsv

or, searching for 21-mers with needletail as the FASTA reader like this:

cargo run --release 21 your/local/path/to/fasta_data.fa . > output.tsv

krust prints to stdout, writing, on alternate lines:

>{frequency}  
{canonical k-mer}
>{frequency}  
{canonical k-mer}  
...