Skip to content
simongog edited this page Aug 29, 2012 · 23 revisions

SDSL: Succinct Data Structure Library


SDSL provides a variety of well engineered succinct data structure libraries in a consistent C++ framework. The library includes extensive documentation and is well tested. The library contains implementations for the following data structures:

  • Compressed bit vector representations
  • Rank and Select data structures for bit vectors
  • Uncompressed and compressed representation for general sequences
  • Rank and Select data structures for general sequences (e.g. wavelet trees)
  • Support data structure for balanced parentheses sequences
  • Succinct range minimum query (RMQ) data structures
  • Compressed suffix arrays (CSA)
  • Compressed longest common prefix (LCP) array representations
  • Compressed suffix trees (CST), FM-Indexes

We have successfully applied the library to handle inputs in the gigabyte range. We have for instance build FM-Indexes and compressed suffix trees for inputs up to 64 GB. Our FM-Index implementations use space close to the best state-of-the-art compressors and outperform older implementations. Here are the benchmark results.

Getting Started

The following steps describe how to download install libsdsl.

  1. Clone the current stable version of libsdsl using git:

     git clone https://github.com/simongog/sdsl.git
    
  2. Change into the sdsl directory and run the install script

     cd sdsl
     ./install.sh [EXISTING_DIR_FOR_HEADER_AND_LIBFILES_AND_NOT_THE_CURRENT_DIR]
    
  3. Run all tests to make sure the library was installed correctly: cd test make && make test

  4. You find a Makefile to compile your programs in the examples directory.

  5. Compile some of the Examples and Tutorials available in this wiki.

Documentation

This wiki contains a wide range of Examples and Tutorials.

The code itself is documented and the auto-generated documentation can be accessed at http://simongog.github.com/sdsl/

Limitations

  • The library is tested under Linux and Mac OS. There are only a few operation depended pieces in the code (like the place of temporary files). So it should be easy also run it on Microsoft Windows with the mingw environment.
  • The number of elements in all data structure is limited by (2^{61}).
  • The indexing part of the library can process text with an alphabet size of 255 (the reason for that: the 0-byte is not allowed to occur in the text is appended to the text as special character.)