Skip to content

hasindu2008/slow5lib

Repository files navigation

slow5lib

slow5lib is a software library for reading & writing SLOW5 files. slow5lib is designed to facilitate use of data in SLOW5 format by third-party software packages. Existing packages that read/write data in FAST5 format can be easily modified to support SLOW5.

About SLOW5 format:
SLOW5 is a new file format for storing signal data from Oxford Nanopore Technologies (ONT) devices. SLOW5 was developed to overcome inherent limitations in the standard FAST5 signal data format that prevent efficient, scalable analysis and cause many headaches for developers. SLOW5 can be encoded in human-readable ASCII format, or a more compact and efficient binary format (BLOW5) - this is analogous to the seminal SAM/BAM format for storing DNA sequence alignments. The BLOW5 binary format supports zlib (DEFLATE) compression, or other compression methods (see notes), thereby minimising the data storage footprint while still permitting efficient parallel access. Detailed benchmarking experiments have shown that SLOW5 format is an order of magnitude faster and significantly smaller than FAST5.

Full documentation: https://hasindu2008.github.io/slow5lib
Pre-print: https://www.biorxiv.org/content/10.1101/2021.06.29.450255v1
SLOW5 specification: https://hasindu2008.github.io/slow5specs

BioConda Install PyPI C/C++ CI Python CI

Building

To build the C/C++ library :

sudo apt-get install zlib1g-dev   #install zlib development libraries
git clone https://github.com/hasindu2008/slow5lib
cd slow5lib
make

This will generate lib/libslow5.a for static linking and libslow5.so for dynamic linking.

The commands to zlib development libraries on some popular distributions :

On Debian/Ubuntu : sudo apt-get install zlib1g-dev
On Fedora/CentOS : sudo dnf/yum install zlib-devel
On OS X : brew install zlib

You can optionally enable zstd compression support when building slow5lib by invoking make zstd=1. This requires zstd 1.3 or higher development libraries installed on your system (libzstd1-dev package for apt, libzstd-devel for yum/dnf and zstd for homebrew). SLOW5 files compressed with zstd offer slightly smaller file size and better performance compared to the default zlib. However, zlib runtime library is available by default on almost all distributions unlike zstd and thus files compressed with zlib will be more 'portable'.

slow5lib from version 0.3.0 onwards uses code from StreamVByte and by default requires vector instructions (SSSE3 or higher for Intel/AMD and neon for ARM). If your processor is an ancient processor with no such vector instructions, invoke make as make no_simd=1.

Usage

Simply include <slow5/slow5.h> in your C program and call the API functions. To compile your program and statically link against slow5lib:

gcc [OPTIONS] -I path/to/slow5lib/include your_program.c path/to/slow5lib/lib/libslow5.a -lm -lz

path/to/slow5lib/ is the absolute or relative path to the slow5lib repository cloned above. To dynamically link:

gcc [OPTIONS] -I path/to/slow5lib/include your_program.c -L path/to/slow5lib/lib/ -lslow5 -lm -lz

If you compiled slow5lib with zstd support enabled, make sure you append -lzstd to the above two commands.

For the documentation of the C API visit here and for the Python API visit here.

Examples

Examples are provided under examples.

  • sequential_read.c demonstrates how to read a slow5/blow5 file, sequentially from start to end.

  • random_read.c demonstrates how to fetch a given read ID from a slow5/blow5 file.

  • header_attribute.c demonstrates how to fetch a header data attribute from a slow5/blow5 file.

  • auxiliary_field.c demonstrates how to fetch a auxiliary field from a slow5/blow5 file.

  • random_read_pthreads.c demonstrates how to fetch given read IDs in parallel from a slow5/blow5 file using pthreads.

  • random_read_openmp.c demonstrates how to fetch given read IDs in parallel from a slow5/blow5 file using openMP.

You can invoke examples/build.sh to compile the example programmes. Have a look at the script to see the commands used for compiling and linking. If you compiled slow5lib with zstd support enabled, make sure you append -lzstd to the compilation commands.

pyslow5

Python wrapper for slow5lib or pyslow5 can be installed using conda as conda install pyslow5 -c bioconda -c conda-forge or pypi as pip install pyslow5. To instructions to build pyslow5 and the usage instructions are here.

Notes

slow5lib from version 0.3.0 onwards has built in StreamVByte compression support to enable even smaller file sizes, which is applied to the raw signal by default when producing BLOW5 files. zlib compression is then applied by default to each SLOW5 record. If zstd is used instead of zlib on top of StreamVByte, it is similar to ONT's latest vbz compression. BLOW5 files with zstd+StreamVByte are still about 25% smaller than vbz compressed FAST5 files.

Acknowledgement

slow5lib uses klib and StreamVByte. Some code snippets have been taken from Minimap2 and Samtools.