Skip to content

ReCycled is a tool to check the circularity of bacterial genome assemblies and circularise them according to the location of the replication initiation protein dnaA.

License

Notifications You must be signed in to change notification settings

Freevini/ReCycled

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReCycled is a tool to check the circularity of bacterial genome assemblies and circularise them according to the location of the replication initiation protein dnaA.

In brief, it detects the presence of the origin of replication (i.e. dnaA). Moreover it checks the circularity of contigs by looking for overlap at the contig edges and mapping the raw data to identify overlapping reads. Based on this information it circularises and restarts the bacterial chromosomes upstream of dnaA gene.

ReCycled was designed and implemented by Vincent Somerville, Michael Schmid and Philipp Engel as a freely available software under the GPLv3 license.

Table of contents

Requirements

ReCycled currently does not support macOS and Windows.

Installation

git clone https://github.com/Freevini/ReCycled.git
cd ReCycled

./ReCycled.sh -h

In addition you have to install Minimap2 and bedtools. Make sure that bash will find them either via adding to PATH variable or installation into a system folder.

Usage examples

Minimally parameterized run of Circleries:
ReCycled.sh -i input_assembly.fasta -l input_long_reads

If you want to run it in multithread mode use flag -t:
ReCycled.sh -i input_assembly.fasta -l input_long_reads -t 8

If you want to overwrite the previous run use flag -F:
ReCycled.sh -i input_assembly.fasta -l input_long_reads -F

If you want to keep all tmp files use flag -x:
ReCycled.sh -i input_assembly.fasta -l input_long_reads -x

If you want to define a specific output file name use -o or output file directory use -d :
ReCycled.sh -i input_assembly.fasta -l input_long_reads -o results -d results/out/directory/

Output of a ReCycled run

ReCycled outputs two files:

  • basename.fasta: Contains the newly restarted fasta file. All contigs that were restarted are labelled with a _restart in the fasta header.
  • basename_analysis_circularity_extended.log: Contains statisticis for all contigs and if they were restarted.

Full usage

usage: ReCycled.sh [-h] -i INPUT_GENOME -l LONG_READ_FILE [-f SHORT_READ_FORWARD]
                     [-r SHORT_READ_REVERSE] [-d  OUTPUT_DIRECTORY] [-O OUTPUT_FILE]
                     [-p ReCycled_SCRIPT_DIRECTORY] [-t THREADS] [-x] [-F] [-v] [-V]

ReCycled: ReCycled: checks the circularity of contigs and restarts them at replication initiation protein

minimal syntax: ReCycled.sh -i <genome_input.fasta> -l <raw_long_read.fastq.gz>
                 options:

                 INPUT
                    -i     input genome name (in fasta format) (MANDATORY)
                    -l     long read file (fq or fq.gz) (MANDATORY)
                    -f     short read forward read (read 1) (fq or fq.gz)
                    -r     short read reverse read (read 2) (fq or fq.gz)
                    -a     Additional custom initiation protein database (add a nucleotide fasta file)

                 OUTPUT
                    -d     output directory [.]
                    -o     output file name

                 RUNNING OPTIONS
                    -t     number of threads to use [4]
                    -x     keep all tmp files created [N]
                    -F     Force intermediate file to run again [N]

                 INFOS
                    -h     help option
                    -V     print Version [N]

How ReCycled works

Test ReCycled with the two provided test data sets:

  1. two circular contigs

ReCycled.sh -i testData/oneCircularContigs_SRR3880379.fasta -l testData/oneCircularContigs_SRR3880379.fq.gz

  1. one circular and eight non-circular contigs

ReCycled.sh -i testData/oneCircular_eigthNonCircular_SRR15376163.fasta -l testData/oneCircular_eigthNonCircular_SRR15376163.fq.gz

Known limitations

  • ReCycled does not polish contigs after start aligning. Might be addressed later if needed.
  • ReCycled does not circularies non-bacterial contigs. It reports them but does not change the location. Might be addressed later if needed.
  • ReCycled does not restart or arrange incomplete bacterial assemblies. Might be addressed later if needed.

License

GNU General Public License, version 3

About

ReCycled is a tool to check the circularity of bacterial genome assemblies and circularise them according to the location of the replication initiation protein dnaA.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published