Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 28;20(1):257.
doi: 10.1186/s13059-019-1891-0.

Improved metagenomic analysis with Kraken 2

Affiliations

Improved metagenomic analysis with Kraken 2

Derrick E Wood et al. Genome Biol. .

Abstract

Although Kraken's k-mer-based approach provides a fast taxonomic classification of metagenomic sequence data, its large memory requirements can be limiting for some applications. Kraken 2 improves upon Kraken 1 by reducing memory usage by 85%, allowing greater amounts of reference genomic data to be used, while maintaining high accuracy and increasing speed fivefold. Kraken 2 also introduces a translated search mode, providing increased sensitivity in viral metagenomics analysis.

Keywords: Alignment-free methods; Metagenomics; Metagenomics classification; Microbiome; Minimizers; Probabilistic data structures.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Differences in operation between the two versions of Kraken. a Both versions of Kraken begin classifying a k-mer by computing its bp minimizer (highlighted in magenta). The default values of k and for each version are shown in the figure. b Kraken 2 applies a spaced seed mask of s spaces to the minimizer and calculates a compact hash code, which is then used as a search query in its compact hash table; the lowest common ancestor (LCA) taxon associated with the compact hash code is then assigned to the k-mer (see the “Methods” section for full details). In Kraken 1, the minimizer is used to accelerate the search for the k-mer, through the use of an offset index and a limited-range binary search; the association between k-mer and LCA is directly stored in the sorted list. c Kraken 2 also achieves lower memory usage than Kraken 1 by using fewer bits to store the LCA and storing a compact hash code of the minimizer rather than the full k-mer. d Impact on speed, memory usage, and prokaryotic genus F1-measure in Kraken 2 when changing k with respect to ( = 31, s = 7 for all three graphs). e Impact on prokaryotic genus sensitivity and positive predictive value (PPV) when changing the number of minimizer spaces s (k = 35,  = 31 for both graphs). In d and e, the data are from our parameter sweep results in Additional file 1: Table S2, and the default values of the independent variables for Kraken 2 are marked with a circle.
Fig. 2
Fig. 2
Comparison between Kraken 2 and other sequence classification tools. a Processing speed (in millions of reads per minute) and memory usage (measured by maximum resident set size, in gigabytes) are shown for each classifier, as evaluated on 50 million paired-end simulated reads with 16 threads. Accuracy results are shown for b 40 prokaryotic genomes and c 10 viral genomes. The results here are shown for sensitivity, positive predictive value (PPV), and F1-measure as evaluated on a per-fragment basis at the genus rank, with 1000 reads simulated from each genome. The strains from which reads were simulated were excluded from the reference libraries for each classification tool. “Kraken 2X” is Kraken 2 using translated search against a protein database. Full results for these strain-exclusion experiments are available in Additional file 1: Table S1

Similar articles

Cited by

References

    1. Kim D, Song L, Breitwieser FP, Salzberg SL. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 2016;26:1721–1729. doi: 10.1101/gr.210641.116. - DOI - PMC - PubMed
    1. Ounit R, Wanamaker S, Close TJ, Lonardi S. CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics. 2015;16:236. doi: 10.1186/s12864-015-1419-2. - DOI - PMC - PubMed
    1. Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun. 2016;7:11257. doi: 10.1038/ncomms11257. - DOI - PMC - PubMed
    1. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15:R46. doi: 10.1186/gb-2014-15-3-r46. - DOI - PMC - PubMed
    1. Breitwieser FP, Baker DN, Salzberg SL. KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol. 2018;19:198. doi: 10.1186/s13059-018-1568-0. - DOI - PMC - PubMed

Publication types

LinkOut - more resources