Question

Trimmomatic running but files containing purged reads are empty

0

Entering edit mode

5 months ago

Wilber0x ▴ 50

I am running a script on paired end illumina data to run trimmomatic to remove adaptor sequences and low quality reads.

The script is below:

input_dir="/home/reads"
# Define the output directory
output_dir="trimmed"
# Create the output directory if it doesn't exist
mkdir -p "$output_dir"

# Loop through all files in the input directory that match the pattern *_1.fastq.gz
for file1 in "$input_dir"/*_1.fastq.gz; do
    # Extract the base name (two-letter code) from the file name
    base_name=$(basename "$file1" _1.fastq.gz)
    file2="${input_dir}/${base_name}_2.fastq.gz"

    # Check if the corresponding _2.fastq.gz file exists
    if [[ -f "$file2" ]]; then
        # Run the trimmomatic command
        trimmomatic PE "$file1" "$file2" \
            -baseout "${output_dir}/${base_name}.fastq.gz" \
            ILLUMINACLIP:adaptors.fasta:4:30:10 MINLEN:30
    else
        echo "Warning: Corresponding file for $file1 not found. Skipping."
    fi
done

The script runs over several hours on my cluster, but when I check the output files that should contain the reads removed by trimmomatic they are empty. I have also run fastqc on the sequence files before and after trimmomatic, and the html report files are identical. I can see from the fastqc outputs that each file fails for "Adapter Content" and contains a high percentage of Illumina Universal Adapter sequences.

Here is the sequence I am using for the Illumina Universal Adapter

>IlluminaUniversalAdapterNA
AGATCGGAAGAG

Why is trimmomatic not removing any reads?

fastqc fastq trimmomatic • 478 views

ADD COMMENT • link updated 5 months ago by GenoMax 146k • written 5 months ago by Wilber0x ▴ 50

score 1 · Answer 1 · 2024-05-22

1

Entering edit mode

5 months ago

GenoMax 146k

Why is trimmomatic not removing any reads?

It is not mandatory that your data have extraneous/adapter sequence. If no extraneous sequence is present then no reads will be trimmed/removed. That said check to make sure that you are providing correct adapter sequences and that file is readable/accessible.

Run a pair of files manually to make sure that the job is running correctly with the options you are providing. Check the log files to see if you see anything odd.

ADD COMMENT • link 5 months ago by GenoMax 146k

0

Entering edit mode

Thanks for the advice, it seems like it is likely I have the incorrect adaptor sequences, though I am still surprised that no low quality reads were removed.