ReCycled is a tool to check the circularity of bacterial genome assemblies and circularise them according to the location of the replication initiation protein dnaA.
In brief, it detects the presence of the origin of replication (i.e. dnaA). Moreover it checks the circularity of contigs by looking for overlap at the contig edges and mapping the raw data to identify overlapping reads. Based on this information it circularises and restarts the bacterial chromosomes upstream of dnaA gene.
ReCycled was designed and implemented by Vincent Somerville, Michael Schmid and Philipp Engel as a freely available software under the GPLv3 license.
- Requirements
- Installation
- Usage examples
- Output of a ReCycled run
- Full usage
- How ReCycled works
- Known limitations
- License
ReCycled currently does not support macOS and Windows.
git clone https://github.com/Freevini/ReCycled.git
cd ReCycled
./ReCycled.sh -h
In addition you have to install Minimap2 and bedtools. Make sure that bash will find them either via adding to PATH variable or installation into a system folder.
Minimally parameterized run of Circleries:
ReCycled.sh -i input_assembly.fasta -l input_long_reads
If you want to run it in multithread mode use flag -t
:
ReCycled.sh -i input_assembly.fasta -l input_long_reads -t 8
If you want to overwrite the previous run use flag -F
:
ReCycled.sh -i input_assembly.fasta -l input_long_reads -F
If you want to keep all tmp files use flag -x
:
ReCycled.sh -i input_assembly.fasta -l input_long_reads -x
If you want to define a specific output file name use -o
or output file directory use -d
:
ReCycled.sh -i input_assembly.fasta -l input_long_reads -o results -d results/out/directory/
ReCycled outputs two files:
- basename.fasta: Contains the newly restarted fasta file. All contigs that were restarted are labelled with a _restart in the fasta header.
- basename_analysis_circularity_extended.log: Contains statisticis for all contigs and if they were restarted.
usage: ReCycled.sh [-h] -i INPUT_GENOME -l LONG_READ_FILE [-f SHORT_READ_FORWARD]
[-r SHORT_READ_REVERSE] [-d OUTPUT_DIRECTORY] [-O OUTPUT_FILE]
[-p ReCycled_SCRIPT_DIRECTORY] [-t THREADS] [-x] [-F] [-v] [-V]
ReCycled: ReCycled: checks the circularity of contigs and restarts them at replication initiation protein
minimal syntax: ReCycled.sh -i <genome_input.fasta> -l <raw_long_read.fastq.gz>
options:
INPUT
-i input genome name (in fasta format) (MANDATORY)
-l long read file (fq or fq.gz) (MANDATORY)
-f short read forward read (read 1) (fq or fq.gz)
-r short read reverse read (read 2) (fq or fq.gz)
-a Additional custom initiation protein database (add a nucleotide fasta file)
OUTPUT
-d output directory [.]
-o output file name
RUNNING OPTIONS
-t number of threads to use [4]
-x keep all tmp files created [N]
-F Force intermediate file to run again [N]
INFOS
-h help option
-V print Version [N]
Test ReCycled with the two provided test data sets:
- two circular contigs
ReCycled.sh -i testData/oneCircularContigs_SRR3880379.fasta -l testData/oneCircularContigs_SRR3880379.fq.gz
- one circular and eight non-circular contigs
ReCycled.sh -i testData/oneCircular_eigthNonCircular_SRR15376163.fasta -l testData/oneCircular_eigthNonCircular_SRR15376163.fq.gz
- ReCycled does not polish contigs after start aligning. Might be addressed later if needed.
- ReCycled does not circularies non-bacterial contigs. It reports them but does not change the location. Might be addressed later if needed.
- ReCycled does not restart or arrange incomplete bacterial assemblies. Might be addressed later if needed.