Skip to content

Optional skipping of short-read input to Filtlong for large datasets #691

Open
@ddomman

Description

Description of the bug

Hi all - long time listener first time caller:

I have a rather large set of Illumina data along with some nanopore reads on which I was trying to run the hybrid assembly option. After 10+ hours, filtlong was still processing the nanopore reads. I did some digging and the current command utilizes the short-read data as part of the reference option. I think that is fine for small-ish datasets but seems impractical for larger ones.

Once I edited the filtlong.nf code to no longer use the short-reads, the filtlong process took less than 5 minutes and the pipeline has proceeded as expected. Maybe there could be a flag to turn on/off that feature?

filtlong.nf:

process FILTLONG {
    tag "$meta.id"

    conda "bioconda::filtlong=0.2.0"
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/filtlong:0.2.0--he513fc3_3' :
        'biocontainers/filtlong:0.2.0--he513fc3_3' }"

    input:
    tuple val(meta), path(long_reads), path(short_reads_1), path(short_reads_2)

    output:
    tuple val(meta), path("${meta.id}_lr_filtlong.fastq.gz"), emit: reads
    path "versions.yml"                                     , emit: versions

    script:
    """
    filtlong \
        -1 ${short_reads_1} \
        -2 ${short_reads_2} \
        --min_length ${params.longreads_min_length} \
        --keep_percent ${params.longreads_keep_percent} \
        --trim \
        --length_weight ${params.longreads_length_weight} \
        ${long_reads} | gzip > ${meta.id}_lr_filtlong.fastq.gz

    cat <<-END_VERSIONS > versions.yml
    "${task.process}":
        filtlong: \$(filtlong --version | sed -e "s/Filtlong v//g")
    END_VERSIONS
    """
}

Edited working solution:

process FILTLONG {
    tag "$meta.id"

    conda "bioconda::filtlong=0.2.0"
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/filtlong:0.2.0--he513fc3_3' :
        'biocontainers/filtlong:0.2.0--he513fc3_3' }"

    input:
    tuple val(meta), path(long_reads), path(short_reads_1), path(short_reads_2)

    output:
    tuple val(meta), path("${meta.id}_lr_filtlong.fastq.gz"), emit: reads
    path "versions.yml"                                     , emit: versions

    script:
    """
    filtlong \
        --min_length ${params.longreads_min_length} \
        --keep_percent ${params.longreads_keep_percent} \
        --length_weight ${params.longreads_length_weight} \
        ${long_reads} | gzip > ${meta.id}_lr_filtlong.fastq.gz

    cat <<-END_VERSIONS > versions.yml
    "${task.process}":
        filtlong: \$(filtlong --version | sed -e "s/Filtlong v//g")
    END_VERSIONS
    """
}

Command used and terminal output

No response

Relevant files

No response

System information

No response

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions