Skip to content

Input data

Rayan Chikhi edited this page Sep 22, 2022 · 4 revisions

Input: a set of read sets in FASTA or FASTQ format, gzipped or not.

File of file (fof) format: One sample per line, with an ID, a list of files and an optional solid threshold.

  • <Sample ID> : <1.fastq.gz> ; ... ; <N.fastq.gz> ! <Abundance min threshold>

Example:

A1 : /path/to/fastq_A1_1 ! 4
B1 : /path/to/fastq_B1_1 ; /with/mutiple/fasta_B1_2 ! 2

If the min abundance threshold is not specified, --hard-min is used (see kmtricks pipeline or kmtricks count).

An example on how to get such an input fof from a folder containing many input files is:

ls -1 folder/*  | sort -n -t/ -k 2 |awk '{print ++count" : "$1}' > list_files.txt