-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #6 from moshi4/develop
Bump to v0.3.0
- Loading branch information
Showing
13 changed files
with
345,906 additions
and
44,014 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
`build_HMMs.sh` script automatically builds the HMM profile database from the following data resources. | ||
This is a modified version of the database build [script](https://github.com/tseemann/barrnap/blob/0.9/build/build_HMMs.sh) provided in Barrnap v0.9. | ||
pybarrnap uses only [Rfam](https://rfam.org/) to create the HMM profile database and does not use [SILVA](https://www.arb-silva.de/), which is used in Barrnap v0.9. | ||
|
||
```diff | ||
Bacteria (70S) | ||
LSU 50S | ||
5S RF00001 | ||
- 23S SILVA-LSU-Bac (Barrnap v0.9) | ||
+ 23S RF02541 (pybarrnap) | ||
SSU 30S | ||
16S RF00177 | ||
|
||
Archaea (70S) | ||
LSU 50S | ||
5S RF00001 | ||
5.8S RF00002 | ||
- 23S SILVA-LSU-Arc (Barrnap v0.9) | ||
+ 23S RF02540 (pybarrnap) | ||
SSU 30S | ||
16S RF01959 | ||
|
||
Eukaryote (80S) | ||
LSU 60S | ||
5S RF00001 | ||
5.8S RF00002 | ||
- 28S SILVA-LSU-Euk (Barrnap v0.9) | ||
+ 28S RF02543 (pybarrnap) | ||
SSU 40S | ||
18S RF01960 | ||
|
||
Metazoan Mito | ||
12S RefSeq (MT-RNR1, s-rRNA, rns) | ||
16S RefSeq (MT-RNR2, l-rRNA, rnl) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
#!/bin/bash | ||
|
||
CPUS=$(grep -c bogomips /proc/cpuinfo) | ||
DB_OUTDIR="./db" | ||
|
||
# Download RFAM annotated seed alignments | ||
RFAM="Rfam.seed" | ||
RFAMURL="ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/${RFAM}.gz" | ||
if [ ! -r "$RFAM" ]; then | ||
echo Downloading: $RFAM | ||
wget $RFAMURL -q --show-progress | ||
gunzip ${RFAM}.gz | ||
else | ||
echo Using existing file: $RFAM | ||
fi | ||
|
||
# Retrieve target accession rRNA MSA | ||
echo Retrieve target accession MSA from MSA database... | ||
|
||
echo Indexing $RFAM | ||
esl-afetch --index $RFAM | ||
|
||
echo "Bacteria (5S=RF00001, 16S=RF00177, 23S=RF02541)" | ||
esl-afetch $RFAM RF00001 >5S.bac.aln | ||
esl-afetch $RFAM RF00177 >16S.bac.aln | ||
esl-afetch $RFAM RF02541 >23S.bac.aln | ||
|
||
echo "Archaea (5S=RF00001, 5.8S=RF00002, 16S=RF01959, 23S=RF02540)" | ||
esl-afetch $RFAM RF00001 >5S.arc.aln | ||
esl-afetch $RFAM RF00002 >5_8S.arc.aln | ||
esl-afetch $RFAM RF01959 >16S.arc.aln | ||
esl-afetch $RFAM RF02540 >23S.arc.aln | ||
|
||
echo "Eukaryote (5S=RF00001, 5.8S=RF00002, 18S=RF01960, 28S=RF02543)" | ||
esl-afetch $RFAM RF00001 >5S.euk.aln | ||
esl-afetch $RFAM RF00002 >5_8S.euk.aln | ||
esl-afetch $RFAM RF01960 >18S.euk.aln | ||
esl-afetch $RFAM RF02543 >28S.euk.aln | ||
|
||
# Check metazoan mitochondria alignment file exists | ||
FILE="12S.mito.aln" | ||
if [ ! -r "$FILE" ]; then | ||
echo "Missing included $FILE file." | ||
exit 1 | ||
fi | ||
FILE="16S.mito.aln" | ||
if [ ! -r "$FILE" ]; then | ||
echo "Missing included $FILE file." | ||
exit 1 | ||
fi | ||
|
||
# Build HMM profile | ||
mkdir -p $DB_OUTDIR | ||
for KINGDOM in arc bac euk mito; do | ||
for TYPE in 5S 5_8S 12S 16S 23S 18S 28S; do | ||
ID="${TYPE}.${KINGDOM}" | ||
if [ -s "${ID}.aln" ]; then | ||
echo "*** $ID ***" | ||
hmmbuild --cpu $CPUS --rna -n ${TYPE}_rRNA ${ID}.hmm ${ID}.aln | ||
fi | ||
done | ||
cat *.${KINGDOM}.hmm >${DB_OUTDIR}/${KINGDOM}.hmm | ||
done | ||
|
||
# Remove unnecessary files | ||
rm -f ${RFAM}.ssi | ||
for KINGDOM in arc bac euk; do | ||
rm -f *.${KINGDOM}.aln *.${KINGDOM}.hmm | ||
done | ||
rm -f *.mito.hmm | ||
|
||
# Show HMM profile files | ||
echo -e "\nFinished building HMM profiles:" | ||
ls -1 ${DB_OUTDIR}/*.hmm |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
from pybarrnap.barrnap import Barrnap | ||
|
||
__version__ = "0.2.0" | ||
__version__ = "0.3.0" | ||
|
||
__all__ = [ | ||
"Barrnap", | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,51 @@ | ||
Rfam version 11.0 was produced at the Wellcome Trust Sanger | ||
Institute. Rfam is based on a sequence database called Rfamseq, which | ||
is derived from WGS and STD data classes from EMBL release 110. | ||
Rfam version 14.8 was produced at the European Bioinformatics Institute. | ||
Rfam is based on a sequence database called Rfamseq. The genome-centric | ||
version of Rfam is built from a collection of reference genomes, | ||
which are provided by Uniprot proteomes. | ||
|
||
Rfam is freely available under the Creative Commons Zero ("CC0") | ||
licence. (http://creativecommons.org/publicdomain/zero/1.0/) | ||
|
||
Rfam is powered by the Infernal package written by Eric Nawrockie at | ||
the Howard Hughes Janelia Farm Research Campus | ||
(http://www.hhmi.org/janelia/labs.html). | ||
Rfam is powered by the Infernal package (http://eddylab.org/infernal/). | ||
The current lead developer of Infernal is Eric Nawrocki, | ||
at the National Center for Biotechnology Information (NCBI, | ||
nawrocke@ncbi.nlm.nih.gov). | ||
|
||
|
||
If you make use of Rfam in your work we ask that you cite the | ||
following publications: | ||
|
||
I. Kalvari, J. Argasinska, N. Quinones-Olvera, E.P. Nawrocki, E. Rivas, | ||
S.R. Eddy, A. Bateman, R.D. Finn and A.I. Petrov | ||
Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families | ||
Nucleic Acids Res. 2017 Sep; | ||
|
||
E.P. Nawrocki, S.W. Burge, A. Bateman, J. Daub, R.Y. Eberhardt, S.R. Eddy, | ||
E.W. Floden, P.P. Gardner, T.A. Jones, J.T. and R.D. Finn | ||
Rfam 12.0: updates to the RNA families database | ||
Nucleic Acids Res. 2014 Nov; | ||
|
||
Burge SW, Daub J, Eberhardt R, Tate JG, Barquist L, Nawrocki E, Eddy S, Gardner | ||
PP, Bateman A | ||
Rfam 11.0: 10 years of RNA families. | ||
Nucleic Acids Res. 2012 Nov; | ||
|
||
Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, | ||
Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A | ||
Rfam: updates to the RNA families database. | ||
Nucleic Acids Res. 2008 Oct; | ||
|
||
Daub J, Gardner PP, Tate J, Ramskˆld D, Manske M, Scott WG, | ||
Weinberg Z, Griffiths-Jones S, Bateman A | ||
The RNA WikiProject: Community annotation of RNA families. | ||
RNA. 2008 Dec; 14:(12)2462-2464 | ||
|
||
Rfam: annotating non-coding RNAs in complete genomes | ||
Sam Griffiths-Jones, Simon Moxon, Mhairi Marshall, Ajay Khanna, | ||
Sean R. Eddy and Alex Bateman | ||
Nucleic Acids Res. 2005 33:D121-D124 | ||
|
||
Rfam: an RNA family database. | ||
Sam Griffiths-Jones, Alex Bateman, Mhairi Marshall, Ajay Khanna | ||
and Sean R. Eddy. | ||
Nucleic Acids Res. 2003 31:439-441 |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.