Skip to content

k mer api

Teo Lemane edited this page Jul 29, 2021 · 1 revision

Kmer API

kmtricks exposes a 2bit representation of k-mers using this encoding A=0, C=1, T=2, G=3. This class supports any k-mer size and has specilization for short k-mers:

  • Kmer<32> uses uint64_t
  • Kmer<64> uses __uint128_t (if available)
  • Kmer uses uint64_t[(K+31)/32]

kmtricks provides also hash utilities and a runtime implementation selector mechanism for work with these k-mers.

Warning: Several k-mers with different size using the same specialization cannot be used at the same time because k-mer size is maintained as static variable.

Usage example

#include <kmtricks/public.hpp>
using namespace km;

int main(int argc, char* argv[])
{
  Kmer<32> kmer("ACGTACGTACGT");
  Kmer<32> kmer2("TACTACTACTAC");

  // k-mer operations
  Kmer<32> rev = kmer.rev_comp();
  Kmer<32> cano = kmer.canonical();

  // comparisons
  bool a = kmer == kmer2;
  bool b = kmer != kmer2;
  bool c = kmer < kmer2;
  bool c = kmer > kmer2;

  // string representation
  std::cout << kmer.to_string() << std::endl;
  std::cout << kmer.to_bit_string() << std::endl;

  // io
  {
    std::ofstream out("kmer_file", std::ios::out | std::ios::binary);
    kmer.dump(out);
  }
  {
    std::ifstream in("kmer_file", std::ios::in | std::ios::binary);
    Kmer<32> loaded(12); // kmer_size
    loaded.load(in);
  }

  // access
  const uint64_t* kmer_data = kmer.get_data64();
  const uint8_t* kmer_data8 = kmer.get_data8();
  
  uint8_t value = kmer.at2bit(0); // 0 (A)
  char nt = kmer.at(0) // 'A'

  // Hash
  using HType = KmerHashers<0>::Hasher<32>; 
  HType hasher;
  uint64_t hash = hasher(kmer);
  // KmerHashers<0> is folly hash, KmerHashers<1> uses xxHash, you need to add #define WITH_XXHASH before include kmtricks and link with xxHash.
}

Kmer supports also a lot of comparison, arithmetic, bitwise and assignment operators, just take a look at kmer.hpp. A standalone implementation is also provided in kmercpp.