Skip to content

Commit

Permalink
Merge pull request #6 from moshi4/develop
Browse files Browse the repository at this point in the history
Bump to v0.3.0
  • Loading branch information
moshi4 authored Mar 8, 2024
2 parents 20d069c + 16ef6b4 commit aa26e90
Show file tree
Hide file tree
Showing 13 changed files with 345,906 additions and 44,014 deletions.
13 changes: 9 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,11 @@ pybarrnap is a python implementation of [barrnap](https://github.com/tseemann/ba
It provides a CLI compatible with barrnap and also provides a python API for running rRNA prediction and retrieving predicted rRNA.
pybarrnap depends only on the python library and not on the external command-line tools nhmmer and bedtools.

> [!NOTE]
> Barrnap v0.9 uses the HMM profile database created from older releases of Rfam(11.0) and SILVA(128).
> On the other hand, pybarrnap uses the HMM profile database created from the Rfam(14.10) latest release as of 2024/03.
> Therefore, there will be some differences in results between Barrnap v0.9 and pybarrnap.
## Installation

`Python 3.8 or later` is required for installation.
Expand Down Expand Up @@ -121,9 +126,9 @@ for rec in result.get_rrna_seq_records():

## LICENSE

pybarrnap uses Barrnap v0.9 HMM profile database created from Rfam and SILVA.
pybarrnap was reimplemented in python based on the perl implementation of Barrnap v0.9.
HMM profile database for pybarrnap was created from Rfam(14.10) using a modified version of the database build script provided in Barrnap v0.9.

- pybarrnap: [GPLv3](https://github.com/moshi4/pybarrnap/blob/main/LICENSE)
- Barrnap: [GPLv3](https://github.com/moshi4/pybarrnap/blob/main/src/pybarrnap/db/LICENSE.Barrnap)
- Rfam: [CC0](https://github.com/moshi4/pybarrnap/blob/main/src/pybarrnap/db/LICENSE.Rfam)
- SILVA: [Free for academic use](https://github.com/moshi4/pybarrnap/blob/main/src/pybarrnap/db/LICENSE.SILVA)
- Barrnap([v0.9](https://github.com/tseemann/barrnap/tree/0.9)): [GPLv3](https://github.com/moshi4/pybarrnap/blob/main/src/pybarrnap/db/LICENSE.Barrnap)
- Rfam([14.10](https://ftp.ebi.ac.uk/pub/databases/Rfam/14.10/)): [CC0](https://github.com/moshi4/pybarrnap/blob/main/src/pybarrnap/db/LICENSE.Rfam)
112,142 changes: 112,142 additions & 0 deletions dbbuild/12S.mito.aln

Large diffs are not rendered by default.

189,290 changes: 189,290 additions & 0 deletions dbbuild/16S.mito.aln

Large diffs are not rendered by default.

35 changes: 35 additions & 0 deletions dbbuild/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
`build_HMMs.sh` script automatically builds the HMM profile database from the following data resources.
This is a modified version of the database build [script](https://github.com/tseemann/barrnap/blob/0.9/build/build_HMMs.sh) provided in Barrnap v0.9.
pybarrnap uses only [Rfam](https://rfam.org/) to create the HMM profile database and does not use [SILVA](https://www.arb-silva.de/), which is used in Barrnap v0.9.

```diff
Bacteria (70S)
LSU 50S
5S RF00001
- 23S SILVA-LSU-Bac (Barrnap v0.9)
+ 23S RF02541 (pybarrnap)
SSU 30S
16S RF00177

Archaea (70S)
LSU 50S
5S RF00001
5.8S RF00002
- 23S SILVA-LSU-Arc (Barrnap v0.9)
+ 23S RF02540 (pybarrnap)
SSU 30S
16S RF01959

Eukaryote (80S)
LSU 60S
5S RF00001
5.8S RF00002
- 28S SILVA-LSU-Euk (Barrnap v0.9)
+ 28S RF02543 (pybarrnap)
SSU 40S
18S RF01960

Metazoan Mito
12S RefSeq (MT-RNR1, s-rRNA, rns)
16S RefSeq (MT-RNR2, l-rRNA, rnl)
```
74 changes: 74 additions & 0 deletions dbbuild/build_HMMs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#!/bin/bash

CPUS=$(grep -c bogomips /proc/cpuinfo)
DB_OUTDIR="./db"

# Download RFAM annotated seed alignments
RFAM="Rfam.seed"
RFAMURL="ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/${RFAM}.gz"
if [ ! -r "$RFAM" ]; then
echo Downloading: $RFAM
wget $RFAMURL -q --show-progress
gunzip ${RFAM}.gz
else
echo Using existing file: $RFAM
fi

# Retrieve target accession rRNA MSA
echo Retrieve target accession MSA from MSA database...

echo Indexing $RFAM
esl-afetch --index $RFAM

echo "Bacteria (5S=RF00001, 16S=RF00177, 23S=RF02541)"
esl-afetch $RFAM RF00001 >5S.bac.aln
esl-afetch $RFAM RF00177 >16S.bac.aln
esl-afetch $RFAM RF02541 >23S.bac.aln

echo "Archaea (5S=RF00001, 5.8S=RF00002, 16S=RF01959, 23S=RF02540)"
esl-afetch $RFAM RF00001 >5S.arc.aln
esl-afetch $RFAM RF00002 >5_8S.arc.aln
esl-afetch $RFAM RF01959 >16S.arc.aln
esl-afetch $RFAM RF02540 >23S.arc.aln

echo "Eukaryote (5S=RF00001, 5.8S=RF00002, 18S=RF01960, 28S=RF02543)"
esl-afetch $RFAM RF00001 >5S.euk.aln
esl-afetch $RFAM RF00002 >5_8S.euk.aln
esl-afetch $RFAM RF01960 >18S.euk.aln
esl-afetch $RFAM RF02543 >28S.euk.aln

# Check metazoan mitochondria alignment file exists
FILE="12S.mito.aln"
if [ ! -r "$FILE" ]; then
echo "Missing included $FILE file."
exit 1
fi
FILE="16S.mito.aln"
if [ ! -r "$FILE" ]; then
echo "Missing included $FILE file."
exit 1
fi

# Build HMM profile
mkdir -p $DB_OUTDIR
for KINGDOM in arc bac euk mito; do
for TYPE in 5S 5_8S 12S 16S 23S 18S 28S; do
ID="${TYPE}.${KINGDOM}"
if [ -s "${ID}.aln" ]; then
echo "*** $ID ***"
hmmbuild --cpu $CPUS --rna -n ${TYPE}_rRNA ${ID}.hmm ${ID}.aln
fi
done
cat *.${KINGDOM}.hmm >${DB_OUTDIR}/${KINGDOM}.hmm
done

# Remove unnecessary files
rm -f ${RFAM}.ssi
for KINGDOM in arc bac euk; do
rm -f *.${KINGDOM}.aln *.${KINGDOM}.hmm
done
rm -f *.mito.hmm

# Show HMM profile files
echo -e "\nFinished building HMM profiles:"
ls -1 ${DB_OUTDIR}/*.hmm
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "pybarrnap"
version = "0.2.0"
version = "0.3.0"
description = "Python implementation of barrnap (Bacterial ribosomal RNA predictor)"
authors = ["moshi4"]
license = "GPL-3.0-only"
Expand Down
2 changes: 1 addition & 1 deletion src/pybarrnap/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from pybarrnap.barrnap import Barrnap

__version__ = "0.2.0"
__version__ = "0.3.0"

__all__ = [
"Barrnap",
Expand Down
53 changes: 47 additions & 6 deletions src/pybarrnap/db/LICENSE.Rfam
Original file line number Diff line number Diff line change
@@ -1,10 +1,51 @@
Rfam version 11.0 was produced at the Wellcome Trust Sanger
Institute. Rfam is based on a sequence database called Rfamseq, which
is derived from WGS and STD data classes from EMBL release 110.
Rfam version 14.8 was produced at the European Bioinformatics Institute.
Rfam is based on a sequence database called Rfamseq. The genome-centric
version of Rfam is built from a collection of reference genomes,
which are provided by Uniprot proteomes.

Rfam is freely available under the Creative Commons Zero ("CC0")
licence. (http://creativecommons.org/publicdomain/zero/1.0/)

Rfam is powered by the Infernal package written by Eric Nawrockie at
the Howard Hughes Janelia Farm Research Campus
(http://www.hhmi.org/janelia/labs.html).
Rfam is powered by the Infernal package (http://eddylab.org/infernal/).
The current lead developer of Infernal is Eric Nawrocki,
at the National Center for Biotechnology Information (NCBI,
nawrocke@ncbi.nlm.nih.gov).


If you make use of Rfam in your work we ask that you cite the
following publications:

I. Kalvari, J. Argasinska, N. Quinones-Olvera, E.P. Nawrocki, E. Rivas,
S.R. Eddy, A. Bateman, R.D. Finn and A.I. Petrov
Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families
Nucleic Acids Res. 2017 Sep;

E.P. Nawrocki, S.W. Burge, A. Bateman, J. Daub, R.Y. Eberhardt, S.R. Eddy,
E.W. Floden, P.P. Gardner, T.A. Jones, J.T. and R.D. Finn
Rfam 12.0: updates to the RNA families database
Nucleic Acids Res. 2014 Nov;

Burge SW, Daub J, Eberhardt R, Tate JG, Barquist L, Nawrocki E, Eddy S, Gardner
PP, Bateman A
Rfam 11.0: 10 years of RNA families.
Nucleic Acids Res. 2012 Nov;

Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S,
Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A
Rfam: updates to the RNA families database.
Nucleic Acids Res. 2008 Oct;

Daub J, Gardner PP, Tate J, Ramskˆld D, Manske M, Scott WG,
Weinberg Z, Griffiths-Jones S, Bateman A
The RNA WikiProject: Community annotation of RNA families.
RNA. 2008 Dec; 14:(12)2462-2464

Rfam: annotating non-coding RNAs in complete genomes
Sam Griffiths-Jones, Simon Moxon, Mhairi Marshall, Ajay Khanna,
Sean R. Eddy and Alex Bateman
Nucleic Acids Res. 2005 33:D121-D124

Rfam: an RNA family database.
Sam Griffiths-Jones, Alex Bateman, Mhairi Marshall, Ajay Khanna
and Sean R. Eddy.
Nucleic Acids Res. 2003 31:439-441
37 changes: 0 additions & 37 deletions src/pybarrnap/db/LICENSE.SILVA

This file was deleted.

Loading

0 comments on commit aa26e90

Please sign in to comment.