Optional skipping of short-read input to Filtlong for large datasets #691
Open
Description
Description of the bug
Hi all - long time listener first time caller:
I have a rather large set of Illumina data along with some nanopore reads on which I was trying to run the hybrid assembly option. After 10+ hours, filtlong
was still processing the nanopore reads. I did some digging and the current command utilizes the short-read data as part of the reference option. I think that is fine for small-ish datasets but seems impractical for larger ones.
Once I edited the filtlong.nf
code to no longer use the short-reads, the filtlong process took less than 5 minutes and the pipeline has proceeded as expected. Maybe there could be a flag to turn on/off that feature?
filtlong.nf
:
process FILTLONG {
tag "$meta.id"
conda "bioconda::filtlong=0.2.0"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/filtlong:0.2.0--he513fc3_3' :
'biocontainers/filtlong:0.2.0--he513fc3_3' }"
input:
tuple val(meta), path(long_reads), path(short_reads_1), path(short_reads_2)
output:
tuple val(meta), path("${meta.id}_lr_filtlong.fastq.gz"), emit: reads
path "versions.yml" , emit: versions
script:
"""
filtlong \
-1 ${short_reads_1} \
-2 ${short_reads_2} \
--min_length ${params.longreads_min_length} \
--keep_percent ${params.longreads_keep_percent} \
--trim \
--length_weight ${params.longreads_length_weight} \
${long_reads} | gzip > ${meta.id}_lr_filtlong.fastq.gz
cat <<-END_VERSIONS > versions.yml
"${task.process}":
filtlong: \$(filtlong --version | sed -e "s/Filtlong v//g")
END_VERSIONS
"""
}
Edited working solution:
process FILTLONG {
tag "$meta.id"
conda "bioconda::filtlong=0.2.0"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/filtlong:0.2.0--he513fc3_3' :
'biocontainers/filtlong:0.2.0--he513fc3_3' }"
input:
tuple val(meta), path(long_reads), path(short_reads_1), path(short_reads_2)
output:
tuple val(meta), path("${meta.id}_lr_filtlong.fastq.gz"), emit: reads
path "versions.yml" , emit: versions
script:
"""
filtlong \
--min_length ${params.longreads_min_length} \
--keep_percent ${params.longreads_keep_percent} \
--length_weight ${params.longreads_length_weight} \
${long_reads} | gzip > ${meta.id}_lr_filtlong.fastq.gz
cat <<-END_VERSIONS > versions.yml
"${task.process}":
filtlong: \$(filtlong --version | sed -e "s/Filtlong v//g")
END_VERSIONS
"""
}
Command used and terminal output
No response
Relevant files
No response
System information
No response